Listen "Spark DataFrame Documentation"
Episode Synopsis
Spark is a library for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.
More episodes of the podcast Programmers Quickie
Http 123
04/10/2025
🌐 Scrapingdog: Web Scraping
10/03/2025
🧊 BigData - Apache Iceberg and Streaming
09/03/2025
📊 RDS PostgreSQL vs Redshift
06/03/2025
📚 DevOps - Terraform Providers
25/02/2025
🐳 Startups - Docker Compose
24/02/2025
💡Client - Why Flux
23/02/2025
🌐 Client - jsdom
22/02/2025
⏱️ java.util.Clock: Mocking Time
18/02/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.