Listen "Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)"
Episode Synopsis
Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:36 - Fiction Bench02:41 - Practicality - YouTube urls + Security - cut-off date03:42 - Coding 06:22 - WeirdML Bench07:01 - Simple Bench Record High 11:23 - Reverse Engineering!13:22 - Anthropic Paper17:49 - 3 CaveatsGemini 2.5 Updated: https://deepmind.google/technologies/gemini/Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87https://simple-bench.com/WeirdML: https://htihle.github.io/weirdml.htmlhttps://x.com/htihle/status/1905014058228625542Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-modelhttps://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cothttps://aistudio.google.com/prompts/new_chatSearch Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.phpLive bench: https://livebench.ai/#/Paper: https://arxiv.org/pdf/2406.19314LiveCode Bench: https://livecodebench.github.io/SWE-Verified: https://arxiv.org/pdf/2310.06770Non-hype Newsletter: https://signaltonoise.beehiiv.com/
More episodes of the podcast AI Explained Official Podcast
GPT-5 has Arrived
08/08/2025
Grok 4 - 10 New Things to Know
10/07/2025
When Will AI Models Blackmail You, and Why?
24/06/2025