ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Training a 1 trillion parameter model

03/09/2025 43 min Episodio 15

Training a 1 trillion parameter model

Listen "Training a 1 trillion parameter model"

Descargar episodio Ver en sitio original

Episode Synopsis

Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism

More episodes of the podcast Pretrained

The sci-fi to startup pipeline 14/01/2026

Can we really trust reasoning 07/01/2026

Our biggest predictions for 2026 19/12/2025

AI's ten big moments of 2025 17/12/2025

Looking back on a year of product market fit 12/12/2025

Looking back on three years of an AI PhD 10/12/2025

OpenReview got "hacked" 03/12/2025

Pretraining is back in vogue with Gemini 3 27/11/2025

Teaching cars about traffic lights 21/11/2025

Pretty pretty please can you hack this 19/11/2025

Ver todos los episodios