EAGLE-3

14/01/2026 17 min
EAGLE-3

Listen "EAGLE-3"

Episode Synopsis

In this episode:• Introduction: The Wait for Tokens: Professor Norris and Linda introduce the episode's paper, EAGLE-3, and discuss the persistent bottleneck of autoregressive generation costs in modern LLMs.• The Speculative Ceiling: Linda explains how previous speculative sampling methods like EAGLE hit a performance wall where adding more training data failed to improve the draft model, identifying the feature prediction constraint as the culprit.• Innovation: Training-Time Test: A deep dive into EAGLE-3's core innovation: abandoning feature prediction in favor of direct token prediction that simulates the testing environment during the training phase.• Going Deeper: Multi-Layer Fusion: The hosts discuss the second major architectural change, where the model stops relying solely on top-layer features and instead fuses low, mid, and high-level features for better context.• Results: A New Scaling Law: Linda reveals the experimental results, including a 6.5x speedup, SGLang integration, and the discovery of a scaling law where draft models finally benefit from more data.