Evals and Aliens – How model testing is not a binary affair

17/11/2025 1h 5min

Listen "Evals and Aliens – How model testing is not a binary affair"

Descargar episodio Ver en sitio original

Episode Synopsis

Pete and Alex examine AI model evaluation methodologies, comparing traditional machine learning metrics with the qualitative assessment challenges of large language models. They discuss the collaborative requirements between technical and business teams to establish evaluation criteria for generative AI systems, highlighting the subjective nature of testing conversational outputs versus binary classification tasks. With the help […]

More episodes of the podcast The Confusion Matrix

How the norms use LLMs 03/12/2025

I suppose a hack’s out of the question? – Adventures in LLM Cyber-security 03/11/2025

GenAI, the state of it! Returns! 27/10/2025

No Surprises – Analysis of The GenAI Divide MIT Report 13/10/2025

Terminal Velocity – LLMs and The Inexorable March to Text First UIs 03/10/2025

Peak LLM = Peak Swiss Cheese 22/09/2025

AI Coded Personalised Software – Brave New World or Brand New Apocalypse 08/09/2025

LLM Quality Assurance Part 2 – Blowing Hot and Cold about Demand Side QA 27/08/2025

LLM Quality Assurance Part 1 – Supply Side QA, model accuracy and why cheese is not a fruit 19/08/2025

BI, the state of it! 05/08/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Evals and Aliens – How model testing is not a binary affair

Listen "Evals and Aliens – How model testing is not a binary affair"

Episode Synopsis

More episodes of the podcast The Confusion Matrix

Increase the rate of email delivery

Localhost, there’s no place like 127.0.0.1

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD