VerticalPodAutoscaler Went Rogue: It Took Down Our Cluster, with Thibault Jamet

16/09/2025 37 min Temporada 7 Episodio 6

Listen "VerticalPodAutoscaler Went Rogue: It Took Down Our Cluster, with Thibault Jamet"

Episode Synopsis

Running 30 Kubernetes clusters serving 300,000 requests per second sounds impressive until your Vertical Pod Autoscaler goes rogue and starts evicting critical system pods in an endless loop.Thibault Jamet shares the technical details of debugging a complex VPA failure at Adevinta, where webhook timeouts triggered continuous pod evictions across their multi-tenant Kubernetes platform.You will learn:VPA architecture deep dive - How the recommender, updater, and mutating webhook components interact and what happens when the webhook failsHidden Kubernetes limits - How default QPS and burst rate limits in the Kubernetes Go client can cause widespread failures, and why these aren't well documented in Helm chartsMonitoring strategies for autoscaling - What metrics to track for webhook latency and pod eviction rates to catch similar issues before they become criticalSponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/rf1pbWXdNInterested in sponsoring an episode? Learn more.

More episodes of the podcast KubeFM