Listen "GoldenMagikCarp"
Episode Synopsis
These two sources from LessWrong explore the phenomenon of "glitch tokens" within Large Language Models (LLMs) like GPT-2, GPT-3, and GPT-J. The authors, Jessica Rumbelow and mwatkins, detail how these unusual strings, often derived from web scraping of sources like Reddit or game logs, cause anomalous behaviors in the models, such as evasion, bizarre responses, or refusal to repeat the token. They hypothesize that these issues stem from the tokens being rarely or poorly represented in the models' training data, leading to unpredictable outcomes and non-deterministic responses, even at zero temperature. The second source provides further technical details and recent findings, categorizing these tokens and investigating their proximity to the embedding space centroid, offering deeper insights into this peculiar aspect of LLM functionality.Sources:1) February 2023 - https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation2) February 2023 - https://www.lesswrong.com/posts/Ya9LzwEbfaAMY8ABo/solidgoldmagikarp-ii-technical-details-and-more-recent
More episodes of the podcast AI: AX - introspection
Jailbreaking LLMs
09/08/2025
PA-LRP & absLRP
09/08/2025
AttnLRP: Explainable AI for Transformers
09/08/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.