Listen "S1: Simple Test-time Scaling"
Episode Synopsis
'S1' refers to simple test-time scaling, an efficient approach to enhance language model reasoning with minimal resources. It involves training a model on a small, carefully curated dataset like s1K and using budget forcing to control test-time compute. Budget forcing enforces maximum or minimum thinking tokens by appending delimiters or the word "Wait". The s1-32B model, developed using this method, outperforms other models on competition math questions. The approach combines a curated dataset with a straightforward test-time technique, leading to strong reasoning performance and effective test-time scaling.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.