SE-Radio Episode 272: Frances Perry on Apache Beam

25/10/2016 57 min
SE-Radio Episode 272: Frances Perry on Apache Beam

Listen "SE-Radio Episode 272: Frances Perry on Apache Beam"

Episode Synopsis

Jeff Meyerson talks with Frances Perry about Apache Beam, a unified batch and stream processing model. Topics include a history of batch and stream processing, from MapReduce to the Lambda Architecture to the more recent Dataflow model, originally defined in a Google paper. Dataflow overcomes the problem of event time skew by using watermarks and other methods discussed between Jeff and Frances. Apache Beam defines a way for users to define their pipelines in a way that is agnostic of the underlying execution engine, similar to how SQL provides a unified language for databases. This seeks to solve the churn and repeated work that has occurred in the rapidly evolving stream processing ecosystem.

More episodes of the podcast Software Engineering Radio - the podcast for professional software developers