“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi

20/12/2025 3 min

Listen "“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi"

Descargar episodio Ver en sitio original

Episode Synopsis

An updated METR graph including Claude Opus 4.5 was just published 3 hours ago on X by METR (source): Same graph but without the log (source): Thread from METR on X: We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date. We don’t think the high upper CI bound reflects Opus's actual capabilities: our current task suite doesn’t have enough long tasks to confidently upper bound Opus 4.5's 50%-time horizon. We are working on updating our task suite, and hope to share more details soon. Based on our experience interacting with Opus 4.5, the model's performance on specific tasks (including some not in our time horizon suite), and its benchmark performance, we would be surprised if further investigation showed Opus had a 20+ hour 50%-time horizon. Despite its high 50%-time horizon, Opus 4.5's 80%-time horizon is only 27 minutes, similar to past models and below GPT-5.1-Codex-Max's 32 mins. The gap between its 50%- and 80%- horizons reflects a [...] ---
First published:
December 19th, 2025

Source:
https://www.lesswrong.com/posts/q5ejXr4CRuPxkgzJD/claude-opus-4-5-achieves-50-time-horizon-of-around-4-hrs-49
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

More episodes of the podcast LessWrong (30+ Karma)

“Small Models Can Introspect, Too” by vgel 22/12/2025

[Linkpost] “What’s the Current Stock Market Bubble?” by PeterMcCluskey 22/12/2025

[Linkpost] “No God Can Help You” by Ape in the coat 22/12/2025

“Can Claude teach me to make coffee?” by philh 21/12/2025

“Turning 20 in the probable pre-apocalypse” by Parv Mahajan 21/12/2025

“Technoromanticism” by lsusr 21/12/2025

“Digital intentionality: What’s the point?” by mingyuan 21/12/2025

“The unreasonable deepness of number theory” by wingspan 21/12/2025

“Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment” by Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam, andyk 21/12/2025

“Contradict my take on OpenPhil’s past AI beliefs” by Eliezer Yudkowsky 20/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi

Listen "“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi"

Episode Synopsis

More episodes of the podcast LessWrong (30+ Karma)

Free Internet, a prediction in Nostradamus style

Information Technology (IT)

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD