Listen "DeepSeek-V2"
Episode Synopsis
DeepSeek-V2 is a Mixture-of-Experts (MoE) language model that balances strong performance with economical training and efficient inference. It uses a total of 236B parameters, with 21B activated for each token, and supports a context length of 128K tokens. Key architectural innovations includeMulti-Head Latent Attention (MLA), which compresses the KV cache for faster inference, andDeepSeekMoE, which enables economical training through sparse computation. Compared to DeepSeek 67B, DeepSeek-V2 saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts maximum generation throughput by 5.76 times. It is pre-trained on 8.1T tokens of high-quality data and further aligned through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.