Reinforcement Learning Under Unmeasured Confounding

28/06/2025 1h 4min

Listen "Reinforcement Learning Under Unmeasured Confounding"

Episode Synopsis

This paper introduces a novel framework for offline reinforcement learning (RL), specifically addressing challenges in scenarios with continuous action spaces and unmeasured confounding variables. The authors develop a method for nonparametric estimation of policy value within an infinite-horizon framework by establishing a new identification result that utilizes "reward-inducing proxy variables." Based on this, they propose a minimax estimator and a policy-gradient-based algorithm to find optimal policies, providing theoretical guarantees for consistency and error bounds. The methodology's effectiveness is demonstrated through extensive simulations and a real-world application involving the German Family Panel data, aiming to identify optimal strategies for enhancing long-term relationship satisfaction.

More episodes of the podcast Neural intel Pod