Gemini's Multimodality

02/07/2025 44 min Episodio 10

Listen "Gemini's Multimodality"

Descargar episodio Ver en sitio original

Episode Synopsis

Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where "everything is vision." Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more. Chapters:0:00 - Intro1:12 - Why Gemini is natively multimodal2:23 - The technology behind multimodal models5:15 - Video understanding with Gemini 2.59:25 - Deciding what to build next13:23 - Building new product experiences with multimodal AI17:15 - The vision for proactive assistants24:13 - Improving video usability with variable FPS and frame tokenization27:35 - What’s next for Gemini’s multimodal development31:47 - Deep dive on Gemini’s document understanding capabilities37:56 - The teamwork and collaboration behind Gemini40:56 - What’s next with model behaviorWatch on YouTube: https://www.youtube.com/watch?v=K4vXvaRV0dw

More episodes of the podcast Google AI: Release Notes

Gemini 3 and Gen UI in Google Search 18/12/2025

Sundar Pichai: Gemini 3, Vibe Coding and Google's Full Stack Strategy 26/11/2025

Nano Banana Pro: Hands-on with the World’s Most Powerful Image Model 26/11/2025

Koray Kavukcuoglu: “This Is How We Are Going to Build AGI” 25/11/2025

Google Antigravity: Hands on with our new agentic development platform 25/11/2025

Gemini 3: Launch day reactions 25/11/2025

How a Moonshot Led to Google DeepMind's Veo 3 16/10/2025

GDM’s Pushmeet Kohli on solving science's biggest challenges with AI 15/09/2025

Behind the scenes of Google's state-of-the-art "nano-banana" image model 27/08/2025

Demis Hassabis on shipping momentum, better evals and world models 11/08/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Gemini's Multimodality

Listen "Gemini's Multimodality"

Episode Synopsis

More episodes of the podcast Google AI: Release Notes

White Hat Hacking, Ethical Hackers…

Increase the rate of email delivery

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD