Transformers Represent Belief State Geometry in their Residual Stream

17/04/2024 23 min
Transformers Represent Belief State Geometry in their Residual Stream

Listen "Transformers Represent Belief State Geometry in their Residual Stream"

Episode Synopsis

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup.Introduction. What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because We have a formalism that relates training data to internal structures in LLMs.Conceptually, our results mean that LLMs synchronize to their internal world model as they move [...]The original text contained 10 footnotes which were omitted from this narration. --- First published: April 16th, 2024 Source: https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their --- Narrated by TYPE III AUDIO.

More episodes of the podcast LessWrong (Curated & Popular)