Open-o3 Video: Spatio-Temporal Grounded Reasoning

26/10/2025 18 min

Listen "Open-o3 Video: Spatio-Temporal Grounded Reasoning"

Descargar episodio Ver en sitio original

Episode Synopsis

The October 25, 2025 Bytedance paper introduces **Open-o3 Video**, a novel framework developed by researchers from **Peking University** and **ByteDance**, aimed at advancing video reasoning by incorporating explicit spatio-temporal evidence. Unlike prior models that only generate textual rationales, Open-o3 Video explicitly highlights key **timestamps** and **bounding boxes** to ground its answers in visual observations. To achieve this, the authors curate two new datasets, **STGR-CoT-30k** and **STGR-RL-36k**, and utilize a two-stage training strategy involving supervised fine-tuning and **Group Sequence Policy Optimization (GSPO)** with specialized rewards. This approach, which includes adaptive temporal proximity and temporal gating mechanisms, significantly improves performance on the **V-STAR benchmark** and other video understanding tasks, making video reasoning more accurate and verifiable.Source:https://arxiv.org/pdf/2510.20579

More episodes of the podcast AI: post transformers

Scaling laws: long context length and in context learning 17/01/2026

DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup 14/01/2026

PageANN: Scalable Disk ANNS with Page-Aligned Graphs 07/12/2025

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values 04/12/2025

NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free 29/11/2025

NeurIPS 2025: Large Language Diffusion Models 29/11/2025

NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example 29/11/2025

NeurIPS 2025: Parallel Scaling Law for Language Models 29/11/2025

NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data 29/11/2025

NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces 29/11/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Open-o3 Video: Spatio-Temporal Grounded Reasoning

Listen "Open-o3 Video: Spatio-Temporal Grounded Reasoning"

Episode Synopsis

More episodes of the podcast AI: post transformers

Email on your own domain, luxury or need?

Prevent Attacks From Your Local Area Network

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD