#76 – Joe Carlsmith on Scheming AI

16/03/2024 1h 51min Episodio 76

Listen "#76 – Joe Carlsmith on Scheming AI"

Episode Synopsis

Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and holds a doctorate in philosophy from the University of Oxford.
You can find links and a transcript at www.hearthisidea.com/episodes/carlsmith
In this episode we talked about a report Joe recently authored, titled ‘Scheming AIs: Will AIs fake alignment during training in order to get power?’. The report “examines whether advanced AIs that perform well in training will be doing so in order to gain power later”; a behaviour Carlsmith calls scheming.
We talk about:

Distinguishing ways AI systems can be deceptive and misaligned
Why powerful AI systems might acquire goals that go beyond what they’re trained to do, and how those goals could lead to scheming
Why scheming goals might perform better (or worse) in training than less worrying goals
The ‘counting argument’ for scheming AI
Why goals that lead to scheming might be simpler than the goals we intend
Things Joe is still confused about, and research project ideas

You can get in touch through our website or on Twitter. Consider leaving us an honest review wherever you're listening to this — it's the best free way to support the show. Thanks for listening!

More episodes of the podcast Hear This Idea

#84 – Dean Spears on the Case for People 01/11/2025

#83 – Max Smeets on Barriers To Cyberweapons 13/03/2025

#82 – Tom Kalil on Institutions for Innovation (with Matt Clancy) 14/12/2024

#81 – Cynthia Schuck on Quantifying Animal Welfare 21/11/2024

#80 – Dan Williams on How Persuasion Works 26/10/2024

#79 – Tamay Besiroglu on Explosive Growth from AI 14/09/2024

#78 – Jacob Trefethen on Global Health R&D 08/09/2024

#77 – Elizabeth Seger on Open Sourcing AI 25/07/2024

#75 – Eric Schwitzgebel on Digital Consciousness and the Weirdness of the World 04/02/2024

#74 – Sonia Ben Ouagrham-Gormley on Barriers to Bioweapons 19/12/2023

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

#76 – Joe Carlsmith on Scheming AI

Listen "#76 – Joe Carlsmith on Scheming AI"

Episode Synopsis

More episodes of the podcast Hear This Idea

Information Technology (IT)

Gray Hat Hacking, those with ambiguous ethics…

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD