“Broadening the training set should help with alignment” by Seth Herd

07/01/2026 16 min

Listen "“Broadening the training set should help with alignment” by Seth Herd"

Descargar episodio Ver en sitio original

Episode Synopsis

Summary Generalization is one lens on the alignment challenge. We'd like network-based AGI to generalize ethical judgments as well as some humans do. Broadening training is a classic and obvious approach to improving generalization in neural networks. Training sets might be broadened to include decisions like whether to evade human control, how to run the world if the opportunity arises, and how to think about one's self and one's goals. Such training might be useful if it's consistent with capability training. But it could backfire if it amounts to lying to a highly intelligent general reasoning system. Broader training sets on types of decisions Training sets for alignment could be broadened in two main ways: types of decisions, and the contexts in which those decisions occur. Any training method could benefit from better training sets, including current alignment training like constitutional AI. The effects of broadening alignment training sets can be investigated empirically, but little work to date directly addresses alignment. Broadening the training set won't solve alignment on its own. It doesn't directly address mesa-optimization concerns. But it should[1] help as part of a hodge-podge collection of alignment approaches. This is a brief take on [...] ---Outline:(00:10) Summary(01:40) Alignment generalization is more nuanced than IID vs OOD(04:02) Generalization for visual and ethical judgments(07:57) Examples of broadening the training set for alignment(10:24) Broadened training for more human-like representations(12:22) Broadening the training set to include reasoning about goals(15:18) Provisional conclusions and next directions The original text contained 8 footnotes which were omitted from this narration. ---
First published:
January 5th, 2026

Source:
https://www.lesswrong.com/posts/oveSZmWHjFQbgR4Nn/broadening-the-training-set-should-help-with-alignment
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

More episodes of the podcast LessWrong (30+ Karma)

“Taking LLMs Seriously (As Language Models)” by abramdemski 10/01/2026

[Linkpost] “On the Origins of Algorithmic Progress in AI” by alex_fogelson 10/01/2026

“Claude Codes” by Zvi 09/01/2026

“Alignment Faking is a Linear Feature in Anthropic’s Hughes Model” by James Hoffend 09/01/2026

“Lumina Probiotic worked for me!” by Eye You 09/01/2026

[Linkpost] “The Hunger Strike To Stop The AI Race” by Michaël Trazzi 09/01/2026

“AI #150: While Claude Codes” by Zvi 09/01/2026

“Why LLMs Aren’t Scientists Yet.” by Dhruv Trehan 09/01/2026

“Self-Help Tactics That Are Working For Me” by sarahconstantin 08/01/2026

“The Economics of Transformative AI” by Jan_Kulveit, David Duvenaud, Raymond Douglas 08/01/2026

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

“Broadening the training set should help with alignment” by Seth Herd

Listen "“Broadening the training set should help with alignment” by Seth Herd"

Episode Synopsis

More episodes of the podcast LessWrong (30+ Karma)

Information Technology (IT)

Educational Technology: From traditional to digital

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD