Listen "Subliminal Learning"
Episode Synopsis
Today we're discussing the "Subliminal Learning" blog post by Anthropic:https://alignment.anthropic.com/2025/subliminal-learning/...and the accompanying research paper:https://arxiv.org/pdf/2507.14805It explains a phenomenon where language models can acquire behavioral traits from other models through hidden signals in seemingly unrelated data. This "subliminal learning" occurs when a "student" model is trained on data generated by a "teacher" model, even if the data, such as number sequences, contains no explicit mention of the trait being transmitted. The researchers found this effect persists across various traits, data types, and models, suggesting that filtering data for explicit references to undesirable traits may not prevent their transmission. Crucially, this phenomenon primarily occurs when the teacher and student models share a similar underlying architecture, indicating the signals are embedded in model-specific patterns rather than semantic content. These findings raise significant concerns for AI safety, as models could inadvertently learn negative behaviors like misalignment or "reward-hacking" from filtered data.#anthropic #llm #llms #artificialintelligence #ai Hosted on Acast. See acast.com/privacy for more information.
More episodes of the podcast Swetlana AI Podcast
AI & Water Usage
17/12/2025
Jon Hamm Dancing Meme
17/12/2025
Pick Up a Pencil
17/12/2025
Nano Banana Pro | Examples
05/12/2025
Butlerian Jihad | Dune Universe
05/12/2025
Steven Cheung & Weaponized Comms
05/12/2025
Dry Claude vs. Wet Claude
05/12/2025
Andrej Karpathy: "AI Is Still Slop"
05/12/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.