DeepSeek-R1

22/01/2025 26 min

Listen "DeepSeek-R1"

Descargar episodio Ver en sitio original

Episode Synopsis

DeepSeek-R1 is a language model focused on enhanced reasoning, employing reinforcement learning (RL) and building upon the DeepSeek-V3-Base model. It uses Group Relative Policy Optimization (GRPO) to reduce computational costs by eliminating the need for a separate critic model, which is commonly used in other algorithms such as PPO. The model uses a multi-stage training pipeline including an initial fine-tuning with cold-start data, followed by reasoning-oriented RL, and supervised fine-tuning (SFT) using rejection sampling, and a final RL stage. A rule-based reward system avoids reward hacking. DeepSeek-R1 also employs a language consistency reward during RL to address language mixing. The model's reasoning capabilities are then distilled into smaller models. DeepSeek-R1 achieves performance comparable to, and sometimes surpassing, OpenAI's o1 series on various reasoning, math, and coding tasks.

More episodes of the podcast Large Language Model (LLM) Talk

Kimi K2 22/07/2025

Mixture-of-Recursions (MoR) 18/07/2025

MeanFlow 10/07/2025

Mamba 10/07/2025

LLM Alignment 14/06/2025

Why We Think 20/05/2025

Deep Research 12/05/2025

vLLM 04/05/2025

Qwen3: Thinking Deeper, Acting Faster 04/05/2025

RAGEN: train and evaluate LLM agents using multi-turn RL 03/05/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

DeepSeek-R1

Listen "DeepSeek-R1"

Episode Synopsis

More episodes of the podcast Large Language Model (LLM) Talk

Educational Technology: From traditional to digital

Digital Natives: Children of today, Technologists of Tomorrow

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD