Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

28/03/2025 21 min Temporada 2 Episodio 11

Listen "Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)"

Episode Synopsis

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:36 - Fiction Bench02:41 - Practicality - YouTube urls + Security - cut-off date03:42 - Coding 06:22 - WeirdML Bench07:01 - Simple Bench Record High 11:23 - Reverse Engineering!13:22 - Anthropic Paper17:49 - 3 CaveatsGemini 2.5 Updated: https://deepmind.google/technologies/gemini/Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87https://simple-bench.com/WeirdML: https://htihle.github.io/weirdml.htmlhttps://x.com/htihle/status/1905014058228625542Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-modelhttps://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cothttps://aistudio.google.com/prompts/new_chatSearch Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.phpLive bench: https://livebench.ai/#/Paper: https://arxiv.org/pdf/2406.19314LiveCode Bench: https://livecodebench.github.io/SWE-Verified: https://arxiv.org/pdf/2310.06770Non-hype Newsletter: https://signaltonoise.beehiiv.com/

More episodes of the podcast AI Explained Official Podcast

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings 26/09/2025

ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops 16/09/2025

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana 26/08/2025

GPT-5 has Arrived 08/08/2025

Genie 3: The World Becomes Playable (DeepMind) 05/08/2025

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …) 21/07/2025

Grok 4 - 10 New Things to Know 10/07/2025

When Will AI Models Blackmail You, and Why? 24/06/2025

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know 12/06/2025

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed 06/06/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

Listen "Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)"

Episode Synopsis

More episodes of the podcast AI Explained Official Podcast

Telecommuting for employees of trust

Googling with breathtaking tricks you ignore

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Internet Predators on the prowl

Gray Hat Hacking, those with ambiguous ethics…

Dot COM: The Internet’s dominant TLD