Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

17/12/2024 55 min

Listen "Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez"

Descargar episodio Ver en sitio original

Episode Synopsis

In this episode of Gradient Dissent, Joseph E. Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, joins host Lukas Biewald to explore innovative approaches to evaluating LLMs.They discuss the concept of vibes-based evaluation, which examines not just accuracy but also the style and tone of model responses, and how Chatbot Arena has become a community-driven benchmark for open-source and commercial LLMs. Joseph shares insights on democratizing model evaluation, refining AI-human interactions, and leveraging human preferences to improve model performance. This episode provides a deep dive into the evolving landscape of LLM evaluation and its impact on AI development.🎙 Get our podcasts on these platforms:Apple Podcasts: http://wandb.me/apple-podcastsSpotify: http://wandb.me/spotifyGoogle: http://wandb.me/gd_googleYouTube: http://wandb.me/youtubeFollow Weights & Biases:https://twitter.com/weights_biases https://www.linkedin.com/company/wandb Join the Weights & Biases Discord Server:https://discord.gg/CkZKRNnaf3

More episodes of the podcast Gradient Dissent: Conversations on AI

The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava 18/11/2025

The Startup Powering The Data Behind AGI 16/09/2025

Arvind Jain on Building Glean and the Future of Enterprise AI 05/08/2025

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski 08/07/2025

GitHub CEO Thomas Dohmke on Copilot and the Future of Software Development 10/06/2025

From Pharma to AGI Hype, and Developing AI in Finance: Martin Shkreli’s Journey 20/05/2025

Inside Cursor: The future of AI coding with Co-founder Sualeh Asif 29/04/2025

Inside the Dark Web, AI and Cybersecurity with Christopher Ahlberg CEO of Recorded Future 08/04/2025

AI, autonomy, and the future of naval warfare with Captain Jon Haase, United States Navy 25/03/2025

The rise of AI agents 25/02/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez

Listen "Evaluating LLMs with Chatbot Arena and Joseph E. Gonzalez"

Episode Synopsis

More episodes of the podcast Gradient Dissent: Conversations on AI

Email on your own domain, luxury or need?

Deep web or Invisible Internet

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD