Listen "EAGLE-3"
Episode Synopsis
In this episode:• Introduction: The Wait for Tokens: Professor Norris and Linda introduce the episode's paper, EAGLE-3, and discuss the persistent bottleneck of autoregressive generation costs in modern LLMs.• The Speculative Ceiling: Linda explains how previous speculative sampling methods like EAGLE hit a performance wall where adding more training data failed to improve the draft model, identifying the feature prediction constraint as the culprit.• Innovation: Training-Time Test: A deep dive into EAGLE-3's core innovation: abandoning feature prediction in favor of direct token prediction that simulates the testing environment during the training phase.• Going Deeper: Multi-Layer Fusion: The hosts discuss the second major architectural change, where the model stops relying solely on top-layer features and instead fuses low, mid, and high-level features for better context.• Results: A New Scaling Law: Linda reveals the experimental results, including a 6.5x speedup, SGLang integration, and the discovery of a scaling law where draft models finally benefit from more data.
More episodes of the podcast Mechanical Dreams
Engram Paper
12/01/2026
From Entropy to Epiplexity- Rethinking Information for Computationally Bounded Intelligence
09/01/2026
Dion- Distributed Orthonormalized Updates
06/01/2026
Latent State Models of Training Dynamics
28/10/2025
DeepSeek OCR
24/10/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.