Listen "180: Reinforcement Learning"
Episode Synopsis
Intro topic: GrillsNews/Links:You can’t call yourself a senior until you’ve worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I’ve ever used — here’s whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF & GRPO
★ Support this podcast on Patreon ★
More episodes of the podcast Programming Throwdown
185: Workflow Orchestrators
04/11/2025
184: Asynchronous Programming
23/09/2025
183: Landing a Software Job in 2025
31/07/2025
182: AI Assisted Coding
30/06/2025
181: Memory Management
12/05/2025
179: Project Planning
03/02/2025
178: Working from Home
03/12/2024
177: Vector Databases
04/11/2024
176: MLOps at SwampUp
24/09/2024
175: Resume Writing
16/08/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.