Listen "Evals and Aliens – How model testing is not a binary affair"
Episode Synopsis
Pete and Alex examine AI model evaluation methodologies, comparing traditional machine learning metrics with the qualitative assessment challenges of large language models. They discuss the collaborative requirements between technical and business teams to establish evaluation criteria for generative AI systems, highlighting the subjective nature of testing conversational outputs versus binary classification tasks. With the help […]
More episodes of the podcast The Confusion Matrix
How the norms use LLMs
03/12/2025
GenAI, the state of it! Returns!
27/10/2025
Peak LLM = Peak Swiss Cheese
22/09/2025
LLM Quality Assurance Part 1 – Supply Side QA, model accuracy and why cheese is not a fruit
19/08/2025
BI, the state of it!
05/08/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.