ALiBi: Attention with Linear Biases Enables Length Extrapolation

01/11/2025 12 min

Listen "ALiBi: Attention with Linear Biases Enables Length Extrapolation"

Episode Synopsis

The April 22, 2022 collaboration between University of Washington, Facebook AI and the Allen Institute for AI introduces Attention with Linear Biases (ALiBi), a novel and efficient method for position representation in transformer models that effectively addresses the challenge of **extrapolation**—a model's ability to maintain performance on input sequences longer than those used during training. The authors demonstrate that traditional position encoding methods, like sinusoidal embeddings, fail to extrapolate efficiently, while alternatives like the T5 bias are computationally costly. **ALiBi improves extrapolation** by biasing query-key attention scores with a distance-proportional penalty, eliminating the need for positional embeddings entirely. This approach is shown to be **faster and more memory-efficient** than baselines, enabling a large 1.3 billion parameter model trained on shorter sequences to achieve comparable or superior perplexity scores when evaluated on significantly longer sequences. The findings suggest that ALiBi's performance gains when extrapolating are primarily due to mitigating the "early token curse" common in sequence-splitting evaluation methods.