Listen "THOR: Hierarchical RL for Mathematical Reasoning"
Episode Synopsis
This September 2025 paper describes THOR (Tool-Integrated Hierarchical Optimization via RL), a novel approach designed to enhance the mathematical reasoning and code generation capabilities of Large Language Models (LLMs) by integrating external code-execution tools. The methodology introduces TIRGen, a pipeline for creating high-quality Tool-Integrated Reasoning (TIR) data, which is crucial for training the model using a hierarchical reinforcement learning (RL) strategy. This RL framework incorporates both trajectory-level optimization for overall problem-solving ability and step-level optimization to correct code generation errors, addressing the sparse reward problem common in long reasoning tasks. Experimental results demonstrate that THOR achieves state-of-the-art (SOTA) performance across various mathematical and code benchmarks for both reasoning and non-reasoning models. Finally, the system leverages code execution feedback for a self-correction inference enhancement mechanism, which further improves performance, especially on more challenging problems.Source:https://arxiv.org/pdf/2509.13761
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.