Takes on "Alignment Faking in Large Language Models"

18/12/2024 1h 27min
Takes on "Alignment Faking in Large Language Models"

Listen "Takes on "Alignment Faking in Large Language Models""

Episode Synopsis

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/