Listen "Can we really trust reasoning"
Episode Synopsis
Pierce and Richard cover the news that dropped over the holiday break. Getting breaking news incorporated within chatbots, OpenAI's "code red" over Google's Gemini 3, benchmarking the reliability of chain of thought to introspect model behavior, and a review of Claude Skills.Further reading:- https://www.wired.com/story/us-invaded-venezuela-and-captured-nicolas-maduro-chatgpt-disagrees- https://fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini-ceo-sundar-pichai/- https://openai.com/index/evaluating-chain-of-thought-monitorability/- https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
More episodes of the podcast Pretrained
The sci-fi to startup pipeline
14/01/2026
Our biggest predictions for 2026
19/12/2025
AI's ten big moments of 2025
17/12/2025
Looking back on a year of product market fit
12/12/2025
Looking back on three years of an AI PhD
10/12/2025
OpenReview got "hacked"
03/12/2025
Pretraining is back in vogue with Gemini 3
27/11/2025
Teaching cars about traffic lights
21/11/2025
Pretty pretty please can you hack this
19/11/2025
How AI research actually gets published
15/11/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.