Listen "Google Mesa: A Geo-Replicated, Near Real-Time Data Warehouse"
Episode Synopsis
**Mesa** is a highly scalable, geo-replicated data warehousing system developed at Google to handle petabytes of data related to its advertising business. **Designed for near real-time data ingestion and querying**, it processes millions of updates per second and serves billions of queries daily. **Key features include strong consistency, high availability, and fault tolerance**, achieved through techniques like multi-version concurrency control and Paxos-based distributed synchronization. The paper details Mesa's architecture, including its storage subsystem using versioned data management with delta compaction, and its multi-datacenter deployment. Finally, it explores operational challenges and lessons learned in building and maintaining such a large-scale system.
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bb1af5424e972c0c15f21e3847708e4d393abfae
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bb1af5424e972c0c15f21e3847708e4d393abfae
More episodes of the podcast The Binary Breakdown
NeonDB: A Serverless PostgreSQL Analysis
31/07/2025
Anna: A KVS For Any Scale
29/05/2025
Conflict-free Replicated Data Types
21/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.