o3 breaks (some) records, but AI becomes pay-to-win

25/04/2025 14 min Temporada 2 Episodio 15
o3 breaks (some) records, but AI becomes pay-to-win

Listen "o3 breaks (some) records, but AI becomes pay-to-win"

Episode Synopsis

A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.https://app.grayswan.ai/ai-explainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:33 - FictionLiveBench01:37 - PHYBench02:14 - SimpleBench02:54 - Virology Capabilities Test03:13 - Mathematics Performance04:29 - Vision Benchmarks05:43 - V* and how o3 works06:44 - Revenue and costs for you08:54 - Expensive RL and trade-offs 09:40 - How to spend the OOMs13:27 - Gray Swan ArenaGreen Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573Visual puzzles: https://neulab.github.io/VisualPuzzles/Fiction Bench: https://x.com/ficlive/status/1912863028141244850https://geobench.org/https://simple-bench.com/AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/USAMO: https://x.com/mbalunovic/status/1914398518896193747NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihqNumber of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihqSubscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/GPU Trade-offs: https://x.com/sama/status/1915098951067554030RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controlsLog-linear Returns: https://x.com/bobmcgrewai/status/18952282919819432652030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030Model Size: https://x.com/slow_developer/status/1874554473256997201Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381Papers on Patreon: https://arxiv.org/pdf/2502.01839https://arxiv.org/pdf/2504.13837Chollet Quote: https://x.com/fchollet/status/1912934762580447447OpenSim: https://opensim.stanford.edu/Non-hype Newsletter: https://signaltonoise.beehiiv.com/