Can we really trust reasoning

07/01/2026 51 min Episodio 40
Can we really trust reasoning

Listen "Can we really trust reasoning"

Episode Synopsis


Pierce and Richard cover the news that dropped over the holiday break. Getting breaking news incorporated within chatbots, OpenAI's "code red" over Google's Gemini 3, benchmarking the reliability of chain of thought to introspect model behavior, and a review of Claude Skills.Further reading:- https://www.wired.com/story/us-invaded-venezuela-and-captured-nicolas-maduro-chatgpt-disagrees- https://fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini-ceo-sundar-pichai/- https://openai.com/index/evaluating-chain-of-thought-monitorability/- https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview