Benchmarking Generalization: How AI Learns Beyond Training Data

05/11/2025 36 min Episodio 9

Listen "Benchmarking Generalization: How AI Learns Beyond Training Data"

Descargar episodio Ver en sitio original

Episode Synopsis

In this episode of Inference Time Tactics, Rob and Cooper from Neurometric sit down with Yash Sharma, an AI researcher whose work is reshaping how we understand model generalization. Yash recently completed his PhD at the Max Planck Institute for Intelligent Systems and has held research roles at Google Brain, Meta AI, Amazon, Borealis AI, and IBM Research. His studies on compositional generalization, adversarial robustness, and long-tail benchmarks reveal when and why models succeed—or fail—at reasoning beyond their training data.
If you’re designing inference-time systems, building agents that need reliability, or just want to understand what “generalization” actually means in practice, this conversation bridges deep theory with actionable insight—clear, technical, and strategically grounded.
Key Topics

What it really means for AI systems to generalize beyond their training data
Why large language models still fail in novel or unpredictable scenarios
How inference-time compute can both amplify and reveal generalization limits
What these limits mean for building reliable, agentic AI systems
How to benchmark generalization in real-world settings
Yash’s “Let It Wag!” benchmark for testing long-tail and under-represented concepts
Why genuine scientific breakthroughs (like curing cancer) require more than scaling test-time compute

Connect with Yash Sharma:

Yash Sharma
Let It Wag! Benchmark
Paper: Pretraining Frequency Predicts Compositional Generalization of CLIP (NeurIPS 2024 Workshop)

Connect with Neurometric:
Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
X: https://x.com/neurometric/
Bluesky: https://bsky.app/profile/neurometric.bsky.social

Rob May
https://x.com/robmay
https://www.linkedin.com/in/robmay

Calvin Cooper
https://x.com/cooper_nyc_
https://www.linkedin.com/in/coopernyc

More episodes of the podcast Inference Time Tactics

Lessons from the Leading Edge: What 420 AI Deployments Reveal About Enterprise Success 22/12/2025

The Thinking Algorithm Leaderboard: Why No Single Model Wins 16/12/2025

Solving the Cold Start Problem in AI Inference 03/10/2025

From MIT Decoding Research to Today’s Inference Tradeoffs 30/09/2025

Drag, Drop, and Deploy: Rethinking How We Build AI Systems 22/09/2025

Beyond Vibe Testing: Smarter Eval for Agentic AI 08/09/2025

GPT-5, The $100B Gap, and The Economics of Inference 29/08/2025

When AI Overthinks: Lessons from the Illusion of Thinking Paper 18/08/2025

The Strategic Trade Offs Behind Inference Time Compute Decisions 12/08/2025

Why Inference Time Compute Is the Future of AI 01/08/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Benchmarking Generalization: How AI Learns Beyond Training Data

Listen "Benchmarking Generalization: How AI Learns Beyond Training Data"

Episode Synopsis

More episodes of the podcast Inference Time Tactics

Internet as human right and its scope

Digital Natives: Children of today, Technologists of Tomorrow

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD