Spectral Gap: Analysis of Attention Layers and Graph Transformers

10/11/2025 14 min

Listen "Spectral Gap: Analysis of Attention Layers and Graph Transformers"

Episode Synopsis

We review two papers on Spectral Gap, one 2021 and another from 2025. The first source presents the **Spectral Attention Network (SAN)**, a novel Transformer-based architecture for graph neural networks that addresses the difficulty of defining positional encodings in graphs by leveraging the **full Laplacian spectrum** to learn node positions. This approach, which involves a **Learned Positional Encoding (LPE)**, enables the fully-connected Transformer to overcome limitations of traditional Graph Neural Networks (GNNs) like **over-squashing** and achieves competitive or superior performance on standard benchmarks. The second source analyzes the **stability and signal propagation** in standard softmax-based attention layers of Transformers at initialization, identifying that a **spectral gap** in the attention matrix causes **rank collapse** both in the width and depth of the network, which hinders effective information flow and leads to **exploding gradients**. To remedy this, the authors propose a **simple modification** that removes the dominant outlier eigenvalue, demonstrating that this fix significantly **mitigates rank collapse** and stabilizes gradient growth in deep Transformer models. Both sources focus on **improving the theoretical foundations and performance** of attention mechanisms, with the first applying Transformers to graphs using spectral theory and the second addressing intrinsic instability issues in the core Transformer architecture.Sources:October 27, 2021:Rethinking Graph Transformers with Spectral
Attentionhttps://arxiv.org/pdf/2106.03893June 16, 2025:Mind the Gap: a Spectral Analysis of Rank Collapse
and Signal Propagation in Attention Layershttps://arxiv.org/pdf/2410.07799