GoldenMagikCarp

09/08/2025 16 min

Listen "GoldenMagikCarp"

Descargar episodio Ver en sitio original

Episode Synopsis

These two sources from LessWrong explore the phenomenon of "glitch tokens" within Large Language Models (LLMs) like GPT-2, GPT-3, and GPT-J. The authors, Jessica Rumbelow and mwatkins, detail how these unusual strings, often derived from web scraping of sources like Reddit or game logs, cause anomalous behaviors in the models, such as evasion, bizarre responses, or refusal to repeat the token. They hypothesize that these issues stem from the tokens being rarely or poorly represented in the models' training data, leading to unpredictable outcomes and non-deterministic responses, even at zero temperature. The second source provides further technical details and recent findings, categorizing these tokens and investigating their proximity to the embedding space centroid, offering deeper insights into this peculiar aspect of LLM functionality.Sources:1) February 2023 - https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation2) February 2023 - https://www.lesswrong.com/posts/Ya9LzwEbfaAMY8ABo/solidgoldmagikarp-ii-technical-details-and-more-recent

More episodes of the podcast AI: AX - introspection

Route Sparse Autoencoder to Interpret Large Language Models 09/08/2025

HarmBench: Automated Red Teaming for LLM Safety 09/08/2025

Jailbreaking LLMs 09/08/2025

PA-LRP & absLRP 09/08/2025

AttnLRP: Explainable AI for Transformers 09/08/2025

Pixel-Wise Explanations for Non-Linear Classifier Decisions 09/08/2025

Multi-Layer Sparse Autoencoders for Transformer Interpretation 09/08/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

GoldenMagikCarp

Listen "GoldenMagikCarp"

Episode Synopsis

More episodes of the podcast AI: AX - introspection

Preparing for a Hacker Threat

Telecommuting for employees of trust

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD