When Will AI Models Blackmail You, and Why?

24/06/2025 26 min Temporada 2 Episodio 21
When Will AI Models Blackmail You, and Why?

Listen "When Will AI Models Blackmail You, and Why?"

Episode Synopsis

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:20 - What prompts blackmail?02:44 - Blackmail walkthrough 06:04 - ‘American interests’08:00 - Inherent desire?10:45 - Switching Goals11:35 - Murder12:22 - Realizing it’s a scenario? 15:02 - Prompt engineering fix?16:27 - Any fixes?17:45 - Chekov’s Gun19:25 - Job implications21:19 - Bonus DetailsReport: https://www.anthropic.com/research/agentic-misalignment30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdfAnnouncement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetOpenAI Files: https://www.openaifiles.org/Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdfNew Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-schemingInteresting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-voidNon-hype Newsletter: https://signaltonoise.beehiiv.com/