Training a 1 trillion parameter model

03/09/2025 43 min Episodio 15
Training a 1 trillion parameter model

Listen "Training a 1 trillion parameter model"

Episode Synopsis


Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism