Listen "Ep. 24 - Operating Excellently"
Episode Synopsis
Episode 0024 - Operating Excellently Operational excellence goes beyond uptime, it’s about building and operating cloud systems with discipline, automation, and continuous improvement. Carl and Brandon break down what operational excellence really means, drawing a distinction between striving for perfection and building resilient, adaptable systems. They discuss how principles from AWS, Azure, and GCP converge around key practices like repeatable automation, structured change management, and process validation. The episode dives into real-world strategies for automation, incident readiness, and observability, including where and how to insert gates, use feature flags, and integrate infrastructure as code across cloud platforms. From avoiding certificate-induced outages to catching misconfigurations early, the key theme is consistency at scale. The discussion also emphasizes the cultural side, why shared ownership, retrospectives, and iterative postmortems matter just as much as tooling. Links Ansible: Ansible community documentation AWS Docs: Amazon CloudWatch documentation overview AWS Docs: Operational Excellence whitepaper AWS Docs: Prescriptive Guidance: Operational Excellence AWS Docs: Using CloudWatch dashboards and alarms AWS Docs: Well‑Architected Framework – Operational Excellence pillar AWS: Getting started with Amazon CloudWatch Google Cloud: Continuously improve and innovate Google Cloud: Manage incidents and problems Google Cloud: Operational Excellence pillar overview Google Cloud: Operational readiness & performance using CloudOps HashiCorp Docs: Terraform configuration language reference HashiCorp Docs: Terraform documentation Microsoft Docs: Automation of tasks with PowerShell in Power Platform Microsoft Learn: Azure Automation documentation Microsoft Learn: Azure Monitor documentation Microsoft Learn: Operational Excellence maturity model Microsoft Learn: Operational Excellence overview & quickstart Microsoft Learn: Operational Excellence principles (maturity model, practices) Microsoft Learn: PowerShell documentation PowerShell Universal Docs: PowerShell Universal platform guide Red Hat Docs: Ansible Automation Platform guide Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech [email protected] linkedin.com/company/cloudchat
More episodes of the podcast CloudChat
Ep. 25 - The Sound of Security
08/09/2025
Ep. 22 - What is Cloud Resiliency, Really?
02/06/2025
Ep. 21 - The 9 Circles of Dependency Hell 🔥
05/05/2025
Ep. 20 - The 3 M's of Going to the Cloud
07/04/2025
Ep. 19 - All Your Data Are Belong to Us
03/03/2025
Ep. 18 - We Can Hardly Contain Ourselves!
03/02/2025
Ep. 17 - The Source is with Us
06/01/2025
Ep. 16 - Control All the Things! 🛩️
02/12/2024
Ep. 15 - Dude, Where's My Server?
04/11/2024