THOR: Hierarchical RL for Mathematical Reasoning

19/09/2025 17 min

Listen "THOR: Hierarchical RL for Mathematical Reasoning"

Descargar episodio Ver en sitio original

Episode Synopsis

This September 2025 paper describes THOR (Tool-Integrated Hierarchical Optimization via RL), a novel approach designed to enhance the mathematical reasoning and code generation capabilities of Large Language Models (LLMs) by integrating external code-execution tools. The methodology introduces TIRGen, a pipeline for creating high-quality Tool-Integrated Reasoning (TIR) data, which is crucial for training the model using a hierarchical reinforcement learning (RL) strategy. This RL framework incorporates both trajectory-level optimization for overall problem-solving ability and step-level optimization to correct code generation errors, addressing the sparse reward problem common in long reasoning tasks. Experimental results demonstrate that THOR achieves state-of-the-art (SOTA) performance across various mathematical and code benchmarks for both reasoning and non-reasoning models. Finally, the system leverages code execution feedback for a self-correction inference enhancement mechanism, which further improves performance, especially on more challenging problems.Source:https://arxiv.org/pdf/2509.13761

More episodes of the podcast AI: post transformers

Attention with a bias 17/01/2026

Squisher: Approximating the Fisher Information Matrix and use cases 17/01/2026

NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training 17/01/2026

Scaling laws: long context length and in context learning 17/01/2026

DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup 14/01/2026

PageANN: Scalable Disk ANNS with Page-Aligned Graphs 07/12/2025

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values 04/12/2025

NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free 29/11/2025

NeurIPS 2025: Large Language Diffusion Models 29/11/2025

NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example 29/11/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

THOR: Hierarchical RL for Mathematical Reasoning

Listen "THOR: Hierarchical RL for Mathematical Reasoning"

Episode Synopsis

More episodes of the podcast AI: post transformers

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD