ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Evaluation metrics for reasoning models

31/07/2025 32 min Episodio 6

Evaluation metrics for reasoning models

Listen "Evaluation metrics for reasoning models"

Descargar episodio Ver en sitio original

Episode Synopsis

Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer

More episodes of the podcast Pretrained

The sci-fi to startup pipeline 14/01/2026

Can we really trust reasoning 07/01/2026

Our biggest predictions for 2026 19/12/2025

AI's ten big moments of 2025 17/12/2025

Looking back on a year of product market fit 12/12/2025

Looking back on three years of an AI PhD 10/12/2025

OpenReview got "hacked" 03/12/2025

Pretraining is back in vogue with Gemini 3 27/11/2025

Teaching cars about traffic lights 21/11/2025

Pretty pretty please can you hack this 19/11/2025

Ver todos los episodios