DeepSeek V3

08/01/2025 14 min Temporada 1 Episodio 1

Listen "DeepSeek V3"

Descargar episodio Ver en sitio original

Episode Synopsis

DeepSeek-V3, a 671B-parameter Mixture-of-Experts large language model. It covers the model's architecture, including Multi-Head Latent Attention and an innovative auxiliary-loss-free load balancing strategy for DeepSeekMoE. The training process, encompassing pre-training on 14.8 trillion tokens and post-training using supervised fine-tuning and reinforcement learning, is described.

paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

More episodes of the podcast WAP: Weekly AI Papers

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

DeepSeek V3

Listen "DeepSeek V3"

Episode Synopsis

More episodes of the podcast WAP: Weekly AI Papers

Personnel recruitment via Web

Choose a domain name, or change it!

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD