706: Large Language Model Leaderboards and Benchmarks

Super Data Science: ML & AI Podcast with Jon Krohn

18/08/2023 33 min

Listen "706: Large Language Model Leaderboards and Benchmarks"

Episode Synopsis

In this episode, Caterina Constantinescu dives deep into Large Language Models (LLMs), spotlighting top leaderboards, evaluation benchmarks, and real-world user perceptions. Plus, discover the challenges of dataset contamination and the intricacies of platforms like HELM and Chatbot Arena.Additional materials: www.superdatascience.com/706Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

More episodes of the podcast Super Data Science: ML & AI Podcast with Jon Krohn

954: Recap of 2025 and Wishing You a Wonderful 2026 02/01/2026

953: Beyond “Agent Washing”: AI Systems That Actually Deliver ROI, with Dell’s Global CTO John Roese 30/12/2025

952: How to Avoid Burnout and Get Promoted, with “The Fit Data Scientist” Penelope Lafeuille 26/12/2025

951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm 23/12/2025

950: Happy Holidays from All of Us at the SuperDataScience Podcast 19/12/2025

949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland 16/12/2025

948: In Case You Missed It in November 2025 12/12/2025

947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li 09/12/2025

946: How Robotaxis Are Transforming Cities 05/12/2025

945: AI is a Joke, with Joel Beasley 02/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

706: Large Language Model Leaderboards and Benchmarks

Listen "706: Large Language Model Leaderboards and Benchmarks"

Episode Synopsis

More episodes of the podcast Super Data Science: ML & AI Podcast with Jon Krohn

Email on your own domain, luxury or need?

Do you work sitting down? Do active breaks

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD