Listen "Scaling Laws"
Episode Synopsis
Scaling laws describe how language model performance improves with increased model size, training data, and compute. These improvements often follow a power-law, with predictable gains as resources scale up. There are diminishing returns with increased scale. Optimal training involves a balance of model size, data, and compute, and may require training large models on less data, stopping before convergence. To prevent overfitting, the dataset size should increase sublinearly with model size. Scaling laws are relatively independent of model architecture. Current large models are often undertrained, suggesting a need for more balanced resource allocation.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.