Listen "GRPO | Group Relative Policy Optimization"
Episode Synopsis
In episode 114 we've been discussing DeepSeek's R1 model, which uses GRPO:https://youtu.be/D0w44TGNsUsSo, what is GRPO?GRPO stands for Group Relative Policy Optimization.It is a reinforcement learning (RL) algorithm developed by the creators of the DeepSeek reasoning model R1. GRPO is designed to enhance the reasoning capabilities of AI models. It was first introduced in the DeepSeekMath paper and was also used in the post-training of DeepSeek-R1. Hosted on Acast. See acast.com/privacy for more information.
More episodes of the podcast Swetlana AI Podcast
AI & Water Usage
17/12/2025
Jon Hamm Dancing Meme
17/12/2025
Pick Up a Pencil
17/12/2025
Nano Banana Pro | Examples
05/12/2025
Butlerian Jihad | Dune Universe
05/12/2025
Steven Cheung & Weaponized Comms
05/12/2025
Dry Claude vs. Wet Claude
05/12/2025
Andrej Karpathy: "AI Is Still Slop"
05/12/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.