Listen "Mamba"
Episode Synopsis
Mamba is a novel deep learning architecture that achieves linear scaling in computation and memory with sequence length, addressing Transformers' quadratic limitations. Its selective State Space Model (SSM) layer dynamically adapts to input context, allowing it to "forget" or "remember" information. Optimizations include a hardware-aware parallel algorithm for its recurrent "selective scan", employing kernel fusion for efficient GPU memory usage and recomputation to reduce memory footprint during training. This results in significantly faster inference (up to 5x throughput) and superior long-context handling.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
DeepSeek-Prover-V2
01/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.