[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

20/01/2024 8 min
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

Listen "[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al"

Episode Synopsis

This is a linkpost for https://arxiv.org/abs/2401.05566Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-throughNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓

More episodes of the podcast LessWrong (Curated & Popular)