Listen "Jonathan Choi, “Large Language Models Are Unreliable Judges”"
Episode Synopsis
Jonathan H. Choi, Large Language Models Are Unreliable Judges.Solum’s Download of the Week for April 12, 2025. Available on SSRN.This is a synthetic academic workshop generated using enTalkenator (a variation of the Workshop template and Claude 3.7 Sonnet).Abstract: “Can large language models (LLMs) serve as "AI judges" that provide answers to legal questions? I conduct the first series of empirical experiments to systematically test the reliability of LLMs as legal interpreters. I find that LLM judgments are highly sensitive to prompt phrasing, output processing methods, and model training choices, undermining their credibility and creating opportunities for motivated judges to cherry-pick results. I also find that post-training procedures used to create the most popular models can cause LLM assessments to substantially deviate from empirical predictions of language use, casting doubt on claims that LLMs elucidate ordinary meaning.”
More episodes of the podcast The enTalkenator Podcast
Workshop on “Sycophantic AI”
27/10/2025
Workshop on “A Definition of AGI”
24/10/2025
Workshop on Coan’s “The Appellate Void”
24/10/2025
Workshop on Cross’s “The Amended Statute”
21/09/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.