The Judge Model Diaries: Judging the Judges

26/08/2025 30 min Episodio 9

Listen "The Judge Model Diaries: Judging the Judges"

Descargar episodio Ver en sitio original

Episode Synopsis

Your LLM gave a great answer. But who decides what “great” means?

In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story.

Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.

More episodes of the podcast YAAP (Yet Another AI Podcast)

The House That Builds Builders – The Origin Story of AGI House 11/11/2025

Scraping Without Getting Sued (Or Falling Asleep) 28/10/2025

RLVR Lets Models Fail Their Way to the Top 12/08/2025

RAG Is Not Solved – Your Evaluation Just Sucks 29/07/2025

The Call Is Coming From Inside the Agent (And It Has Your Credentials) 15/07/2025

Building Enterprise RAG: Lessons from 2+ Years of Production Deployments 01/07/2025

Trailer 19/06/2025

You Can’t Have an Agent Without a Plan: What 90% of ’Agents’ Are Missing 17/06/2025

The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail 10/06/2025

Tool Calling 2.0: How MCP Is Standardizing AI Connections 29/05/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

The Judge Model Diaries: Judging the Judges

Listen "The Judge Model Diaries: Judging the Judges"

Episode Synopsis

More episodes of the podcast YAAP (Yet Another AI Podcast)

Deep web or Invisible Internet

Positive Attitude, Share your ZARZA Attitude!

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD