Listen "Measuring AI Ability to Complete Long Tasks"
Episode Synopsis
By Thomas Kwa et al.We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.Source: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.
More episodes of the podcast AI Safety Fundamentals
AI and Leviathan: Part I
29/09/2025
d/acc: One Year Later
19/09/2025
A Playbook for Securing AI Model Weights
18/09/2025
Resilience and Adaptation to Advanced AI
18/09/2025
Introduction to AI Control
18/09/2025
The Project: Situational Awareness
18/09/2025
The Intelligence Curse
18/09/2025