"What failure looks like" by Paul Christiano

27/03/2023 18 min

Listen ""What failure looks like" by Paul Christiano"

Episode Synopsis

https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-likeCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:Part I: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")Part II: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of optimization daemons.)I think these are the most important problems if we fail to solve intent alignment.In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several years.

More episodes of the podcast LessWrong (Curated & Popular)

"Scientific breakthroughs of the year" by technicalities 17/12/2025

"A high integrity/epistemics political machine?" by Raemon 17/12/2025

"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala 16/12/2025

“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes 15/12/2025

“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f 14/12/2025

“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw 13/12/2025

“The funding conversation we left unfinished” by jenn 13/12/2025

“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck 11/12/2025

“Little Echo” by Zvi 09/12/2025

“A Pragmatic Vision for Interpretability” by Neel Nanda 08/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

"What failure looks like" by Paul Christiano

Listen ""What failure looks like" by Paul Christiano"

Episode Synopsis

More episodes of the podcast LessWrong (Curated & Popular)

Deep web or Invisible Internet

Internet Predators on the prowl

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD