180: Reinforcement Learning

17/03/2025 1h 52min Episodio 180

Listen "180: Reinforcement Learning"

Descargar episodio Ver en sitio original

Episode Synopsis

Intro topic: GrillsNews/Links:You can’t call yourself a senior until you’ve worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I’ve ever used — here’s whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF & GRPO

★ Support this podcast on Patreon ★

More episodes of the podcast Programming Throwdown

185: Workflow Orchestrators 04/11/2025

184: Asynchronous Programming 23/09/2025

183: Landing a Software Job in 2025 31/07/2025

182: AI Assisted Coding 30/06/2025

181: Memory Management 12/05/2025

179: Project Planning 03/02/2025

178: Working from Home 03/12/2024

177: Vector Databases 04/11/2024

176: MLOps at SwampUp 24/09/2024

175: Resume Writing 16/08/2024

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

180: Reinforcement Learning

Listen "180: Reinforcement Learning"

Episode Synopsis

More episodes of the podcast Programming Throwdown

Orthographic errors in Web pages

7 Advices to Prevent Identity Theft

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD