Listen "“Broadening the training set should help with alignment” by Seth Herd"
Episode Synopsis
Summary Generalization is one lens on the alignment challenge. We'd like network-based AGI to generalize ethical judgments as well as some humans do. Broadening training is a classic and obvious approach to improving generalization in neural networks. Training sets might be broadened to include decisions like whether to evade human control, how to run the world if the opportunity arises, and how to think about one's self and one's goals. Such training might be useful if it's consistent with capability training. But it could backfire if it amounts to lying to a highly intelligent general reasoning system. Broader training sets on types of decisions Training sets for alignment could be broadened in two main ways: types of decisions, and the contexts in which those decisions occur. Any training method could benefit from better training sets, including current alignment training like constitutional AI. The effects of broadening alignment training sets can be investigated empirically, but little work to date directly addresses alignment. Broadening the training set won't solve alignment on its own. It doesn't directly address mesa-optimization concerns. But it should[1] help as part of a hodge-podge collection of alignment approaches. This is a brief take on [...] ---Outline:(00:10) Summary(01:40) Alignment generalization is more nuanced than IID vs OOD(04:02) Generalization for visual and ethical judgments(07:57) Examples of broadening the training set for alignment(10:24) Broadened training for more human-like representations(12:22) Broadening the training set to include reasoning about goals(15:18) Provisional conclusions and next directions The original text contained 8 footnotes which were omitted from this narration. ---
First published:
January 5th, 2026
Source:
https://www.lesswrong.com/posts/oveSZmWHjFQbgR4Nn/broadening-the-training-set-should-help-with-alignment
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
First published:
January 5th, 2026
Source:
https://www.lesswrong.com/posts/oveSZmWHjFQbgR4Nn/broadening-the-training-set-should-help-with-alignment
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
More episodes of the podcast LessWrong (30+ Karma)
“Claude Codes” by Zvi
09/01/2026
“Lumina Probiotic worked for me!” by Eye You
09/01/2026
“AI #150: While Claude Codes” by Zvi
09/01/2026
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.