Learning from Failure at Scale

13/04/2020

Listen "Learning from Failure at Scale"

Descargar episodio Ver en sitio original

Episode Synopsis

One of the difficulties for the average network operator trying to understand their failure rates and reasons is they just don’t have enough devices, or enough incidents, to make informed observations. If you have a couple of dozen switches, it is often hard to understand how often software defects take a device down versus human error (Mean Time Between Mistakes, or MTBM). As networks become larger, however, more information becomes available, and more interesting observations can be made. A recent paper written in conjunction with Facebook uses information from Facebook’s data center fabrics to make some observations about the rate and severity of different kinds of failures—needless to say, the results are fairly interesting.

More episodes of the podcast DESIGN – rule 11 reader

Hedge 265: Out of Band Networks 04/04/2025

Architecture and Process 12/04/2024

Simple or Complex? 19/09/2023

Hedge 144: IPv6 Lessons Learned 25/08/2022

Route Servers and Loops 16/08/2022

RFC9199: Lessons in Large-scale Service Deployment 08/08/2022

Hedge 134: Ten Things 15/06/2022

Revisiting BGP Convergence 06/06/2022

BGP Policies (Part 2) 14/03/2022

BGP Policies (part 1) 07/03/2022

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Learning from Failure at Scale

Listen "Learning from Failure at Scale"

Episode Synopsis

More episodes of the podcast DESIGN – rule 11 reader

Orthographic errors in Web pages

Subdomains, a glance with the experts!

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD