Listen "Reward Models | Data Brew | Episode 40"
Episode Synopsis
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).Highlights include:- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.- Techniques like Policy Proximal Optimization (PPO) and Direct PreferenceOptimization (DPO) for enhancing response quality.- The role of reward models in improving coding, math, reasoning, and other NLP tasks.Connect with Brandon Cui:https://www.linkedin.com/in/bcui19/
More episodes of the podcast Data Brew by Databricks
Multimodal AI | Data Brew | Episode 42
07/04/2025
Age of Agents | Data Brew | Episode 41
27/03/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.