Listen "Evaluation metrics for reasoning models"
Episode Synopsis
Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer
More episodes of the podcast Pretrained
The sci-fi to startup pipeline
14/01/2026
Can we really trust reasoning
07/01/2026
Our biggest predictions for 2026
19/12/2025
AI's ten big moments of 2025
17/12/2025
Looking back on a year of product market fit
12/12/2025
Looking back on three years of an AI PhD
10/12/2025
OpenReview got "hacked"
03/12/2025
Pretraining is back in vogue with Gemini 3
27/11/2025
Teaching cars about traffic lights
21/11/2025
Pretty pretty please can you hack this
19/11/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.