<div class="css-vsf5of">If you build and operate a business-critical application with many service dependencies, then you know how hard it is to prevent latency and errors from ruining your day.&nbsp; The operational challenges stemming from dynamic architectures and interdependent components result in wasted time and revenue loss.&nbsp; Even with hefty investments in observability tools and teams, many companies are still struggling. The root issue? A focus on MTTD (Mean-Time-To-Detect) instead of MTTU (Mean-Time-To-Understand).In this session, you’ll learn:<ul class="carina-rte-public-DraftStyleDefault-ul"><li>How engineering teams use causal reasoning to shift their focus from chasing alerts to proactively ensuring service reliability.</li><li>Why more data isn’t the answer to understanding cause-and-effect in complex systems.</li><li>The best methods for empowering service owners to control reliability instead of chasing it.</li></ul></div>

Rethinking Reliability for Distributed Systems

Speakers

Endre Sara

Ben Yemini