Listen "DeepSeek-V3 Technical Deep Dive"
Episode Synopsis
DeepSeek-V3, is a open-weights large language model. DeepSeek-V3's key features include its remarkably low development cost, achieved through innovative techniques like inference-time computing and an auxiliary-loss-free load balancing strategy. The model's architecture utilizes Mixture-of-Experts (MoE) and Multi-head Latent Attention (MLA) for efficiency. Extensive testing on various benchmarks demonstrates strong performance comparable to, and in some cases exceeding, leading closed-source models. Finally, the text provides recommendations for future AI hardware design based on the DeepSeek-V3 development process.https://arxiv.org/pdf/2412.19437v1
More episodes of the podcast AI Blindspot
AIE World's fair Recap of Day 2
24/06/2025
Understanding Agentic Workflows
20/05/2025
Building Effective AI Agents
04/05/2025
Agentic Design Pattern III - Tool Use
20/12/2024
Agentic Design Pattern II - Reflection
02/12/2024
Agentic design pattern I - Planning
04/11/2024
AI Agents
29/10/2024
AI Utopia
20/10/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.