DeepSeek V3

08/01/2025 14 min Temporada 1 Episodio 1

Listen "DeepSeek V3"

Episode Synopsis

DeepSeek-V3, a 671B-parameter Mixture-of-Experts large language model. It covers the model's architecture, including Multi-Head Latent Attention and an innovative auxiliary-loss-free load balancing strategy for DeepSeekMoE. The training process, encompassing pre-training on 14.8 trillion tokens and post-training using supervised fine-tuning and reinforcement learning, is described.

paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf


More episodes of the podcast WAP: Weekly AI Papers