- This event has ended, but you can still access event content.
Rethinking Reliability for Distributed Systems
Tuesday, June 10, 2025|7:00 PM – 8:00 PM UTCIf you build and operate a business-critical application with many service dependencies, then you know how hard it is to prevent latency and errors from ruining your day. The operational challenges stemming from dynamic architectures and interdependent components result in wasted time and revenue loss. Even with hefty investments in observability tools and teams, many companies are still struggling. The root issue? A focus on MTTD (Mean-Time-To-Detect) instead of MTTU (Mean-Time-To-Understand).
In this session, you’ll learn:
- How engineering teams use causal reasoning to shift their focus from chasing alerts to proactively ensuring service reliability.
- Why more data isn’t the answer to understanding cause-and-effect in complex systems.
- The best methods for empowering service owners to control reliability instead of chasing it.
Speakers