When Will AI Models Blackmail You, and Why?

24/06/2025 26 min Temporada 2 Episodio 21

Listen "When Will AI Models Blackmail You, and Why?"

Episode Synopsis

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:20 - What prompts blackmail?02:44 - Blackmail walkthrough 06:04 - ‘American interests’08:00 - Inherent desire?10:45 - Switching Goals11:35 - Murder12:22 - Realizing it’s a scenario? 15:02 - Prompt engineering fix?16:27 - Any fixes?17:45 - Chekov’s Gun19:25 - Job implications21:19 - Bonus DetailsReport: https://www.anthropic.com/research/agentic-misalignment30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdfAnnouncement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetOpenAI Files: https://www.openaifiles.org/Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdfNew Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-schemingInteresting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-voidNon-hype Newsletter: https://signaltonoise.beehiiv.com/

More episodes of the podcast AI Explained Official Podcast

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection) 10/11/2025

Sora 2 - It will only get more realistic from here 01/10/2025

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings 26/09/2025

ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops 16/09/2025

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana 26/08/2025

GPT-5 has Arrived 07/08/2025

Genie 3: The World Becomes Playable (DeepMind) 05/08/2025

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …) 21/07/2025

Grok 4 - 10 New Things to Know 10/07/2025

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know 12/06/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

When Will AI Models Blackmail You, and Why?

Listen "When Will AI Models Blackmail You, and Why?"

Episode Synopsis

More episodes of the podcast AI Explained Official Podcast

Internet Predators on the prowl

Bandwidth: Broadband or Narrowband?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD