“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes

15/12/2025 22 min

Listen "“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes"

Episode Synopsis

Previous: 2024, 2022 “Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter.” –attributed to DL Moody[1] 1. Background & threat model The main threat model I’m working to address is the same as it's been since I was hobby-blogging about AGI safety in 2019. Basically, I think that: The “secret sauce” of human intelligence is a big uniform-ish learning algorithm centered around the cortex; This learning algorithm is different from and more powerful than LLMs; Nobody knows how it works today; Someone someday will either reverse-engineer this learning algorithm, or reinvent something similar; And then we’ll have Artificial General Intelligence (AGI) and superintelligence (ASI). I think that, when this learning algorithm is understood, it will be easy to get it to do powerful and impressive things, and to make money, as long as it's weak enough that humans can keep it under control. But past that stage, we’ll be relying on the AGIs to have good motivations, and not be egregiously misaligned and scheming to take over the world and wipe out humanity. Alas, I claim that the latter kind of motivation is what we should expect to occur, in [...] ---Outline:(00:26) 1. Background & threat model(02:24) 2. The theme of 2025: trying to solve the technical alignment problem(04:02) 3. Two sketchy plans for technical AGI alignment(07:05) 4. On to what I've actually been doing all year!(07:14) Thrust A: Fitting technical alignment into the bigger strategic picture(09:46) Thrust B: Better understanding how RL reward functions can be compatible with non-ruthless-optimizers(12:02) Thrust C: Continuing to develop my thinking on the neuroscience of human social instincts(13:33) Thrust D: Alignment implications of continuous learning and concept extrapolation(14:41) Thrust E: Neuroscience odds and ends(16:21) Thrust F: Economics of superintelligence(17:18) Thrust G: AGI safety miscellany(17:41) Thrust H: Outreach(19:13) 5. Other stuff(20:05) 6. Plan for 2026(21:03) 7. Acknowledgements The original text contained 7 footnotes which were omitted from this narration. --- First published: December 11th, 2025 Source: https://www.lesswrong.com/posts/CF4Z9mQSfvi99A3BR/my-agi-safety-research-2025-review-26-plans --- Narrated by TYPE III AUDIO. ---Images from the article:

More episodes of the podcast LessWrong (Curated & Popular)

“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f 14/12/2025

“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw 13/12/2025

“The funding conversation we left unfinished” by jenn 13/12/2025

“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck 11/12/2025

“Little Echo” by Zvi 09/12/2025

“A Pragmatic Vision for Interpretability” by Neel Nanda 08/12/2025

“AI in 2025: gestalt” by technicalities 08/12/2025

“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky 07/12/2025

“An Ambitious Vision for Interpretability” by leogao 06/12/2025

“6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes 04/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes

Listen "“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes"

Episode Synopsis

More episodes of the podcast LessWrong (Curated & Popular)

Internet as human right and its scope

Preparing for a Hacker Threat

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD