Listen "GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism"
Episode Synopsis
This episode breaks down the research paper "GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism," which proposes a new method for training very large neural networks by partitioning the model across multiple accelerators and using a novel batch-splitting pipelining algorithm. This approach allows for the efficient training of larger models than previously possible, achieving almost linear speedup with the number of accelerators.Audio : (Spotify) https://open.spotify.com/episode/4zXyQKSdiSUFK7HkAi6pxO?si=eWWrNsURSqGtw6Phf4tpJgPaper: https://arxiv.org/abs/1811.06965
More episodes of the podcast Marvin's Memos
The Scaling Hypothesis - Gwern
17/11/2024
The Bitter Lesson - Rich Sutton
17/11/2024
Llama 3.2 + Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
17/11/2024
Sparse and Continuous Attention Mechanisms
16/11/2024
The Intelligence Age - Sam Altman
11/11/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.