ELASTIC: Linear Attention for Sequential Interest Compression

31/10/2025 12 min

Listen "ELASTIC: Linear Attention for Sequential Interest Compression"

Episode Synopsis

The February 12, 2025 KuaiShou Inc paper introduces **ELASTIC**, an Efficient Linear Attention for SequenTial Interest Compression framework designed to address the **scalability issues** of traditional transformer-based sequential recommender systems, which suffer from quadratic complexity with respect to sequence length. ELASTIC achieves this by proposing a **Linear Dispatcher Attention (LDA) layer** that compresses long user behavior sequences into a more compact representation, leading to **linear time complexity** and significant reductions in GPU memory usage and increased inference speed. Furthermore, the framework incorporates an **Interest Memory Retrieval (IMR) technique** that uses a large, sparsely retrieved interest memory bank to expand the model's capacity and **maintain recommendation accuracy** despite the computational optimizations. Empirical results from experiments on datasets like ML-1M and XLong demonstrate that ELASTIC **outperforms baseline methods** while offering superior computational efficiency, especially when modeling long user sequences.Source:https://arxiv.org/pdf/2408.09380

More episodes of the podcast AI: post transformers