Varieties of fake alignment (Section 1.1 of "Scheming AIs")

16/11/2023 17 min
Varieties of fake alignment (Section 1.1 of "Scheming AIs")

Listen "Varieties of fake alignment (Section 1.1 of "Scheming AIs")"

Episode Synopsis

This is section 1.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379   Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power   Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power