AI Evals & Discovery

23/09/2025 26 min
AI Evals & Discovery

Listen "AI Evals & Discovery"

Episode Synopsis

Building AI products isn’t just about clever prompts and orchestration—it’s about knowing if what you’ve built actually works. In this episode, Teresa Torres and Petra Wille dive deep into AI evals: how they’re defined, why they’re essential, and how teams can implement them to ensure product quality.

Teresa shares her journey building her Interview Coach tool and the hard lessons she learned about evals along the way. From golden datasets and synthetic data to error analysis, code-based checks, and LLM-as-judge methods, you’ll walk away with a clearer picture of how to measure and improve AI products over time.