Takes on "Alignment Faking in Large Language Models"

18/12/2024 1h 27min

Listen "Takes on "Alignment Faking in Large Language Models""

Episode Synopsis

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/

More episodes of the podcast Joe Carlsmith Audio

How human-like do safe AI motivations need to be? 12/11/2025

Leaving Open Philanthropy, going to Anthropic 03/11/2025

Controlling the options AIs can pursue 29/09/2025

Giving AIs safe motivations 18/08/2025

The stakes of AI moral status 21/05/2025

Can we safely automate alignment research? 30/04/2025

AI for AI safety 14/03/2025

Paths and waystations in AI safety 11/03/2025

When should we worry about AI power-seeking? 19/02/2025

What is it to solve the alignment problem? 13/02/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Takes on "Alignment Faking in Large Language Models"

Listen "Takes on "Alignment Faking in Large Language Models""

Episode Synopsis

More episodes of the podcast Joe Carlsmith Audio

Subdomains, a glance with the experts!

Email on your own domain, luxury or need?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD