Listen "Training a 1 trillion parameter model"
Episode Synopsis
Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism
More episodes of the podcast Pretrained
The sci-fi to startup pipeline
14/01/2026
Can we really trust reasoning
07/01/2026
Our biggest predictions for 2026
19/12/2025
AI's ten big moments of 2025
17/12/2025
Looking back on a year of product market fit
12/12/2025
Looking back on three years of an AI PhD
10/12/2025
OpenReview got "hacked"
03/12/2025
Pretraining is back in vogue with Gemini 3
27/11/2025
Teaching cars about traffic lights
21/11/2025
Pretty pretty please can you hack this
19/11/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.