The Resilience Problem

27/04/2020
The Resilience Problem

Listen "The Resilience Problem"

Episode Synopsis



…we have educated generations of computer scientists on the paradigm that analysis of algorithm only means analyzing their computational efficiency. As Wikipedia states: "In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them." In other words, efficiency is the sole concern in the design of algorithms. … What about resilience? —Moshe Y. Vardi

This quote set me to thinking about how efficiency and resilience might interact, or trade off against one another, in networks. The most obvious extreme cases are two routers connected via a single long-haul link and the highly parallel data center fabrics we build today. Obviously adding a second long-haul link would improve resilience—but at what cost in terms of efficiency? Its also obvious highly meshed data center fabrics have plenty of resilience—and yet they still sometimes fail. Why?