o3 - wow

21/12/2024 22 min Temporada 1 Episodio 9

Listen "o3 - wow"

Episode Synopsis

o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. FrontierMath: https://epoch.ai/frontiermathhttps://arxiv.org/pdf/2411.04872Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthroughMLC Paper: https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=socialAlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdfHuman Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/https://simple-bench.com/John Hallman Tweet: https://x.com/johnohallman/status/187023337568194572500:00 - Introduction01:19 - What is o3?03:18 - FrontierMath05:15 - o4, o506:03 - GPQA06:24 - Coding, Codeforces + SWE-verified, AlphaCode 208:13 - 1st Caveat09:03 - Compositionality?10:16 - SimpleBench?13:11 - ARC-AGI, Chollet

More episodes of the podcast AI Explained Official Podcast

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings 26/09/2025

ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops 16/09/2025

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana 26/08/2025

GPT-5 has Arrived 08/08/2025

Genie 3: The World Becomes Playable (DeepMind) 05/08/2025

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …) 21/07/2025

Grok 4 - 10 New Things to Know 10/07/2025

When Will AI Models Blackmail You, and Why? 24/06/2025

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know 12/06/2025

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed 06/06/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

o3 - wow

Listen "o3 - wow"

Episode Synopsis

More episodes of the podcast AI Explained Official Podcast

Localhost, there’s no place like 127.0.0.1

Educational Technology: From traditional to digital

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD