Listen "From pandas to Arrow: Wes McKinney on the Future of Data Infrastructure"
Episode Synopsis
SummaryIn this episode of Tech on the Rocks, Kostas and Nitay sit down with Wes McKinney the creator of pandas and co-creator of Apache Arrow and Ibis, and long-time leader in the Python data ecosystem. Wes walks us through his journey from building pandas in 2008 to rethinking how we represent and move columnar data with Arrow, and why Arrow is fundamentally different from file formats like Parquet and ORC.We get into the future of data file formats, DataFusion and the new generation of query engines, the rise of open data lakes (Iceberg, Delta, Hudi), and why “big metadata” is becoming just as important as big data. Wes also shares candid thoughts on open source sustainability, how companies and infrastructure projects really survive, and how AI coding agents like Claude Code are changing the day-to-day work of software engineers, especially for complex systems work.If you care about the foundations of modern data infrastructure, or you’ve ever called import pandas as pd, this is an episode you won’t want to miss.Chapters00:00 Intro — Wes McKinney & his journey in the Python data ecosystem02:15 How pandas evolved & why UX first mattered for data science06:14 Open source sustainability, funding & the Posit model07:31 From pandas to Datapad, Cloudera & the origins of Apache Arrow and Ibis13:38 What is Apache Arrow? In‑memory columnar data, batches & schemas22:23 Inside Arrow IPC — zero‑copy, Flatbuffers & cross‑language interop24:34 Arrow vs Parquet — columnar memory format vs columnar storage format29:28 The next generation of columnar file formats & GPU‑friendly encodings36:03 Big metadata, table formats & the rise of Iceberg/Delta/Hudi43:05 Rethinking data systems: from big data to DuckDB, Rust & “no JVM” stacks54:11 DataFusion as a modular Rust query engine for modern startups57:58 Open source, the composable data stack & why infra is “AI‑resistant”01:00:07 Vibe‑coding with AI agents — using Claude Code in real projects01:09:49 AI, open source maintainers & the risks of AI‑generated contributions01:18:57 Bridging LLMs and data: ADBC, data context & the future of infra + AI
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.