Listen "The Judge Model Diaries: Judging the Judges"
Episode Synopsis
Your LLM gave a great answer. But who decides what “great” means?
In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story.
Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.
In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story.
Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.
More episodes of the podcast YAAP (Yet Another AI Podcast)
RLVR Lets Models Fail Their Way to the Top
12/08/2025
Trailer
19/06/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.