Listen "Kimi k1.5"
Episode Synopsis
Kimi k1.5 is a multimodal LLM trained with reinforcement learning (RL). Key aspects include: long context scaling to 128k, improving performance with increased context length; improved policy optimization using a variant of online mirror descent; and a simplistic framework that enables planning and reflection without complex methods. It uses a reference policy in its off-policy RL approach, and long2short methods such as model merging and DPO to transfer knowledge from long-CoT to short-CoT models, achieving state-of-the-art reasoning performance. The model is jointly trained on text and vision data.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.