Listen "DeepSeek v3"
Episode Synopsis
DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model, trained ~10x less cost, with 671 billion total parameters, of which 37 billion are activated for each token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. A key feature of DeepSeek-V3 is its auxiliary-loss-free load balancing strategy and multi-token prediction training objective. The model was pre-trained on 14.8 trillion tokens and underwent supervised fine-tuning and reinforcement learning. It has demonstrated strong performance on various benchmarks, achieving results comparable to leading closed-source models while maintaining economical training costs.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.