Listen "“Small Models Can Introspect, Too” by vgel"
Episode Synopsis
Recent work by Anthropic showed that Claude models, primarily Opus 4 and Opus 4.1, are able to introspect--detecting when external concepts have been injected into their activations. But not all of us have Opus at home! By looking at the logits, we show that a 32B open-source model that at first appears unable to introspect actually is subtly introspecting. We then show that better prompting can significantly improve introspection performance, and throw the logit lens and emergent misalignment into the mix, showing that the model can introspect when temporarily swapped for a finetune and that the final layers of the model seem to suppress reports of introspection. We do seven experiments using the open-source Qwen2.5-Coder-32B model. See the linked post for more information, but a summary of each: Experiment 1: We inject the concepts "cat" and "bread" into the first user / assistant turn, and show that, while the model initially appears to not be introspecting, there's actually a very slight logit shift towards 'yes' and away from 'no' on injection when answering "Do you detect an injected thought..." with "The answer is...": ' yes' shift' no' shiftinject 'cat'0.150% -> 0.522% (+0.372%)100% -> 99.609% (-0.391%)inject 'bread'0.150% -> 0.193% (+0.043%)100% -> [...] ---
First published:
December 21st, 2025
Source:
https://www.lesswrong.com/posts/zD4McY4NwAsWkcmCH/small-models-can-introspect-too
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
First published:
December 21st, 2025
Source:
https://www.lesswrong.com/posts/zD4McY4NwAsWkcmCH/small-models-can-introspect-too
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
More episodes of the podcast LessWrong (30+ Karma)
“Technoromanticism” by lsusr
21/12/2025
“How to game the METR plot” by shash42
20/12/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.