Listen "Uptime Labs and the Multi-Party Dilemma (Part II)"
Episode Synopsis
Watch on YouTubeIn Part II of the Multi-Party Dilemma (MPD) drill retrospective, we reconvene to dig deeper into the implications and nuances of the simulated incident exercise hosted on the Uptime Labs platform. Eric Dobbs (incident analyst), Alex Elman (deputy IC), and Sarah Butt (incident commander) continue their debrief with Courtney, reflecting on how team behavior evolved under stress, the importance of expertise in managing non-technical aspects of an incident like saturation, and how deeply held assumptions often go unspoken until tested under pressure.This episode emphasizes the complex social and cognitive dimensions of incident response, such as how people coordinate, communicate, and construct shared understanding. It highlights the value of analyzing drills not for failure points, but for what they reveal about real work, adaptation, and human coordination.Key HighlightsIncident Analysis as a Practice:Eric Dobbs emphasized understanding how people make sense of unfolding events, rather than judging decisions in hindsight.The goal is to study the “why it made sense at the time,” not what was “right” or “wrong.”Drills Expose Hidden Assumptions:Even experienced responders bring unspoken mental models into incidents.The drill revealed assumptions about communication flows, authority boundaries, and vendor interactions that were not made explicit in planning.The Value of Human Expertise:Everyone involved in this incident brought an unparalleled level of expertise to the work. Often this kind of expertise goes unnoticed or is taken for granted, however this kind of knowledge is precisely what makes for smoother, better coordinated (and sometimes), faster incident response.Importance of Framing:The way questions are asked in retrospectives can shape what is revealed—e.g., “What made that hard?” is more productive than “What did you miss?”Reframing incidents around constraints and tradeoffs leads to deeper insight.Team Learning and Culture:Safe, high-trust environments enable better learning during drills.Psychological safety allows team members to admit confusion or raise alternate interpretations during real incidents.Resources and ReferencesEpisode IModel of Overload/Saturation as part of the Theory of Graceful ExtensibilityLorin's Law
More episodes of the podcast The VOID
Canva and the Thundering Herd
14/05/2025
Episode 8: A Tale of A Near Miss
28/02/2025
Episode 7: When Uptime Met Downtime
30/01/2025
Episode 6: Laura Nolan and Control Pain
25/04/2023
Episode 3: Spotify and A Year of Incidents
20/10/2022
Episode 1: Honeycomb and the Kafka Migration
01/11/2021
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.