GRPO | Group Relative Policy Optimization

10/09/2025 14 min Temporada 1 Episodio 184
GRPO | Group Relative Policy Optimization

Listen "GRPO | Group Relative Policy Optimization"

Episode Synopsis

In episode 114 we've been discussing DeepSeek's R1 model, which uses GRPO:https://youtu.be/D0w44TGNsUsSo, what is GRPO?GRPO stands for Group Relative Policy Optimization.It is a reinforcement learning (RL) algorithm developed by the creators of the DeepSeek reasoning model R1. GRPO is designed to enhance the reasoning capabilities of AI models. It was first introduced in the DeepSeekMath paper and was also used in the post-training of DeepSeek-R1. Hosted on Acast. See acast.com/privacy for more information.