Zero Bubble Pipeline Parallelism

08/07/2024
Zero Bubble Pipeline Parallelism

Listen "Zero Bubble Pipeline Parallelism"

Episode Synopsis




Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer,
schedule so that you are always working instead of waiting (bubble).

Read full paper: https://arxiv.org/abs/2401.10241

Tags: Systems and Performance, Deep Learning, Machine Learning

More episodes of the podcast Byte Sized Breakthroughs