ELASTIC: Linear Attention for Sequential Interest Compression

31/10/2025 12 min

Listen "ELASTIC: Linear Attention for Sequential Interest Compression"

Descargar episodio Ver en sitio original

Episode Synopsis

The February 12, 2025 KuaiShou Inc paper introduces **ELASTIC**, an Efficient Linear Attention for SequenTial Interest Compression framework designed to address the **scalability issues** of traditional transformer-based sequential recommender systems, which suffer from quadratic complexity with respect to sequence length. ELASTIC achieves this by proposing a **Linear Dispatcher Attention (LDA) layer** that compresses long user behavior sequences into a more compact representation, leading to **linear time complexity** and significant reductions in GPU memory usage and increased inference speed. Furthermore, the framework incorporates an **Interest Memory Retrieval (IMR) technique** that uses a large, sparsely retrieved interest memory bank to expand the model's capacity and **maintain recommendation accuracy** despite the computational optimizations. Empirical results from experiments on datasets like ML-1M and XLong demonstrate that ELASTIC **outperforms baseline methods** while offering superior computational efficiency, especially when modeling long user sequences.Source:https://arxiv.org/pdf/2408.09380

More episodes of the podcast AI: post transformers

Scaling laws: long context length and in context learning 17/01/2026

DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup 14/01/2026

PageANN: Scalable Disk ANNS with Page-Aligned Graphs 07/12/2025

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values 04/12/2025

NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free 29/11/2025

NeurIPS 2025: Large Language Diffusion Models 29/11/2025

NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example 29/11/2025

NeurIPS 2025: Parallel Scaling Law for Language Models 29/11/2025

NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data 29/11/2025

NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces 29/11/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

ELASTIC: Linear Attention for Sequential Interest Compression

Listen "ELASTIC: Linear Attention for Sequential Interest Compression"

Episode Synopsis

More episodes of the podcast AI: post transformers

Choose a domain name, or change it!

White Hat Hacking, Ethical Hackers…

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD