Listen "RLHF (Reinforcement Learning from Human Feedback)"
Episode Synopsis
Reinforcement Learning from Human Feedback (RLHF) incorporates human preferences into AI systems, addressing problems where specifying a clear reward function is difficult. The basic pipeline involves training a language model, collecting human preference data to train a reward model, and optimizing the language model with an RL optimizer using the reward model. Techniques like KL divergence are used for regularization to prevent over-optimization. RLHF is a subset of preference fine-tuning techniques. It has become a crucial technique in post-training to align language models with human values and elicit desirable behaviors.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.