Listen "Sobering Up on AI Progress w/ Dr. Sean McGregor"
Episode Synopsis
Sean McGregor and I discuss about why evaluating AI systems has become so difficult; we cover everything from the breakdown of benchmarking, how incentives shape safety work, and what approaches like BenchRisk (his recent paper at NeurIPS) and AI auditing aim to fix as systems move into the real world. We also talk about his history and journey in AI safety, including his PhD on ML for public policy, how he started the AI Incident Database, and what he's working on now: AVERI, a non-profit for frontier model auditing.Chapters(00:00) - Intro
(02:36) - What's broken about benchmarking
(03:41) - Sean’s wild PhD
(14:28) - The phantom internship
(19:25) - Sean's journey
(22:25) - Market-vs-regulatory modes and AIID
(32:13) - Drunk on AI progress
(38:34) - BenchRisk
(43:20) - Moral hazards and Master Hand
(50:34) - Liability, Section 230, and open source
(59:20) - AVERI
(01:11:30) - Closing thoughts & outro
LinksSean McGregor's websiteAVERI websiteBenchRiskBenchRisk websiteNeurIPS paper - Risk Management for Mitigating Benchmark Failure Modes: BenchRiskNeurIPS paper - AI and the Everything in the Whole Wide World BenchmarkAIIDAI Incident Database websiteIAAI paper - Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident DatabasePreprint - Lessons for Editors of AI Incidents from the AI Incident DatabaseAIAAIC website (another incident tracker)Hot AI SummerCACM article - A Few Useful Things to Know About Machine LearningCACM article - How the AI Boom Went BustUndergraduate Thesis - Analyzing the Prospect of an Approaching AI WinterTech Genies article - AI History: The First Summer and Winter of AICACM article - There Was No ‘First AI Winter’Measuring GeneralizationNeural Computation article - The Lack of A Priori Distinctions Between Learning AlgorithmsICLR paper - Understanding deep learning requires rethinking generalizationICML paper - Model-agnostic Measure of Generalization DifficultyRadiology Artificial Intelligence article - Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological PitfallsPreprint - Quantifying Generalization Complexity for Large Language ModelsInsurers Exclude AIFinancial Times article - Insurers retreat from AI cover as risk of multibillion-dollar claims mountTom's Hardware article - Major insurers move to avoid liability for AI lawsuits as multi-billion dollar risks emerge — Recent public incidents have lead to costly repercussionsInsurance Newsnet article - Insurers Scale Back AI Coverage Amid Fears of Billion-Dollar ClaimsInsurance Business article - Insurance’s gen AI reckoning has comeSection 230Section 230 overviewLegal sidebar - Section 230 Immunity and Generative Artificial IntelligenceBad Internet Bills websiteTechDirt article - Section 230 Faces Repeal. Support The Coverage That’s Been Getting It Right All Along.Privacy Guides video - Dissecting Bad Internet Bills with Taylor Lorenz: KOSA, SCREEN Act, Section 230Journal of Technology in Behavioral Health article - Social Media and Mental Health: Benefits, Risks, and Opportunities for Research and PracticeTime article - Lawmakers Unveil New Bills to Curb Big Tech’s Power and ProfitHouse Hearing transcript - Legislative Solutions to Protect Children and Teens OnlineRelevant Kairos.fm EpisodesInto AI Safety episode - Growing BlueDot's Impact w/ Li-Lian AngmuckrAIkers episode - NeurIPS 2024 Wrapped 🌯Other LinksEncyclopedia of Life websiteIBM Watson AI XPRIZE websiteML Commons websiteWikipedia article
More episodes of the podcast Into AI Safety
Getting Agentic w/ Alistair Lowe-Norris
20/10/2025
Growing BlueDot's Impact w/ Li-Lian Ang
15/09/2025
Getting Into PauseAI w/ Will Petillo
23/06/2025
INTERVIEW: StakeOut.AI w/ Dr. Peter Park (3)
25/03/2024
INTERVIEW: StakeOut.AI w/ Dr. Peter Park (2)
18/03/2024
MINISODE: Restructure Vol. 2
11/03/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.