Listen "“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi"
Episode Synopsis
An updated METR graph including Claude Opus 4.5 was just published 3 hours ago on X by METR (source): Same graph but without the log (source): Thread from METR on X: We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date. We don’t think the high upper CI bound reflects Opus's actual capabilities: our current task suite doesn’t have enough long tasks to confidently upper bound Opus 4.5's 50%-time horizon. We are working on updating our task suite, and hope to share more details soon. Based on our experience interacting with Opus 4.5, the model's performance on specific tasks (including some not in our time horizon suite), and its benchmark performance, we would be surprised if further investigation showed Opus had a 20+ hour 50%-time horizon. Despite its high 50%-time horizon, Opus 4.5's 80%-time horizon is only 27 minutes, similar to past models and below GPT-5.1-Codex-Max's 32 mins. The gap between its 50%- and 80%- horizons reflects a [...] ---
First published:
December 19th, 2025
Source:
https://www.lesswrong.com/posts/q5ejXr4CRuPxkgzJD/claude-opus-4-5-achieves-50-time-horizon-of-around-4-hrs-49
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
First published:
December 19th, 2025
Source:
https://www.lesswrong.com/posts/q5ejXr4CRuPxkgzJD/claude-opus-4-5-achieves-50-time-horizon-of-around-4-hrs-49
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
More episodes of the podcast LessWrong (30+ Karma)
“Small Models Can Introspect, Too” by vgel
22/12/2025
“Technoromanticism” by lsusr
21/12/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.