Listen "Qwen3: Thinking Deeper, Acting Faster"
Episode Synopsis
Qwen3 models introduce both Mixture-of-Experts (MoE) and dense architectures. They utilize hybrid thinking modes, allowing users to balance response speed and reasoning depth for tasks, controllable via parameters or tags. Developed through a multi-stage post-training pipeline, Qwen3 is trained on a significantly expanded dataset of approximately 36 trillion tokens across 119 languages. This enhances its multilingual support for global applications. The models also feature improved agentic capabilities, notably excelling in tool calling, which increases their utility for complex, interactive tasks.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
DeepSeek-Prover-V2
01/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.