Listen "Mixture-of-Recursions (MoR)"
Episode Synopsis
Mixture-of-Recursions (MoR) is a unified framework built on a Recursive Transformer architecture, designed to enhance the efficiency of large language models. It achieves this by combining three core paradigms: parameter sharing (reusing shared layers across recursion steps), adaptive computation (dynamically assigning different processing depths to individual tokens via lightweight routers), and efficient Key-Value (KV) caching (selectively storing or sharing KV pairs). This integrated approach enables MoR to deliver large-model quality with significantly reduced computational and memory overhead, improving efficiency for both training and inference.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
DeepSeek-Prover-V2
01/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.