“Defending Against Model Weight Exfiltration Through Inference Verification” by Roy Rinberg

15/12/2025 18 min

Listen "“Defending Against Model Weight Exfiltration Through Inference Verification” by Roy Rinberg"

Descargar episodio Ver en sitio original

Episode Synopsis

Authors: Roy Rinberg, Adam Karvonen, Alex Hoover, Daniel Reuter, Keri Warr Arxiv paper link One Minute Summary Anthropic has adopted upload limits to prevent model weight exfiltration. The idea is simple: model weights are very large, text outputs are small, so if we cap the output bandwidth, we can make model weight transfer take a long time. The problem is that inference servers now generate an enormous amount of tokens (on the order of ~1TB tokens per day), and the output text channel is the one channel you can't easily restrict. Nonetheless, in this work we find that it's possible to dramatically limit the amount of information an adversary can send using those output tokens. This is because LLM inference is nearly deterministic: if you fix the sampling seed and regenerate an output, over ~98% of tokens match exactly. This means an attacker attempting to send secret information via steganography, the practice of embedding hidden messages inside otherwise normal-looking text, has very little entropy in the user channel to work with. We show that steganographic exfiltration can be limited to <0.5% of the total information being sent through the user channel (e.g. from 1TB/day to 5 GB/day), extending exfiltration [...] ---Outline:(01:43) Paper: Verifying LLM Inference to Prevent Model Weight Exfiltration(04:28) The Key Insight: LLM Inference is mostly Deterministic(06:42) Not all tokens are equally likely, even under non-determinism(09:14) The Verification Scheme(10:58) Headline Results: Information-Theoretic Bounds on Exfiltration(13:52) Other Applications of Inference Verification(15:31) Limitations(16:10) Seeing this work in production:(17:55) Resources(18:13) How to Cite The original text contained 2 footnotes which were omitted from this narration. ---
First published:
December 15th, 2025

Source:
https://www.lesswrong.com/posts/7i33FDCfcRLJbPs6u/defending-against-model-weight-exfiltration-through-1
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

More episodes of the podcast LessWrong (30+ Karma)

“Response to titotal’s critique of our AI 2027 timelines model” by elifland, Daniel Kokotajlo 16/12/2025

“Do you love Berkeley, or do you just love Lighthaven conferences?” by Screwtape 15/12/2025

“A Case for Model Persona Research” by nielsrolf, Maxime Riché, Daniel Tan 15/12/2025

“The Axiom of Choice is Not Controversial” by GenericModel 15/12/2025

“A high integrity/epistemics political machine?” by Raemon 14/12/2025

“No, Americans Don’t Think Foreign Aid Is 26% of the Budget” by Julius 14/12/2025

“The Inevitable Evolution of AI Agents” by Steven McCulloch 14/12/2025

“Why did I believe Oliver Sacks?” by Eye You 14/12/2025

“Conditional On Long-Range Signal, Ising Still Factors Locally” by johnswentworth, David Lorell 14/12/2025

[Linkpost] “Wages under superintelligence” by Zachary Brown 14/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

“Defending Against Model Weight Exfiltration Through Inference Verification” by Roy Rinberg

Listen "“Defending Against Model Weight Exfiltration Through Inference Verification” by Roy Rinberg"

Episode Synopsis

More episodes of the podcast LessWrong (30+ Karma)

Telecommuting for employees of trust

Bandwidth: Broadband or Narrowband?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD