“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi

20/12/2025 3 min
“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi

Listen "“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi"

Episode Synopsis

An updated METR graph including Claude Opus 4.5 was just published 3 hours ago on X by METR (source): Same graph but without the log (source): Thread from METR on X: We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date. We don’t think the high upper CI bound reflects Opus's actual capabilities: our current task suite doesn’t have enough long tasks to confidently upper bound Opus 4.5's 50%-time horizon. We are working on updating our task suite, and hope to share more details soon. Based on our experience interacting with Opus 4.5, the model's performance on specific tasks (including some not in our time horizon suite), and its benchmark performance, we would be surprised if further investigation showed Opus had a 20+ hour 50%-time horizon. Despite its high 50%-time horizon, Opus 4.5's 80%-time horizon is only 27 minutes, similar to past models and below GPT-5.1-Codex-Max's 32 mins. The gap between its 50%- and 80%- horizons reflects a [...] ---
First published:
December 19th, 2025

Source:
https://www.lesswrong.com/posts/q5ejXr4CRuPxkgzJD/claude-opus-4-5-achieves-50-time-horizon-of-around-4-hrs-49
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

More episodes of the podcast LessWrong (30+ Karma)