Listen "MiniMax-01"
Episode Synopsis
MiniMax-01 is a series of large language and vision-language models that use lightning attention and a mixture of experts (MoE) to achieve long context processing. The models, MiniMax-Text-01 and MiniMax-VL-01, match the performance of top-tier models, like GPT-4o and Claude-3.5-Sonnet, while offering 20-32 times longer context windows, reaching up to 4 million tokens during inference. The models use a hybrid architecture, with linear and softmax attention mechanisms, and are trained on large datasets of text, code, and image-caption pairs. They also use a multi-stage training process with supervised fine-tuning and reinforcement learning to optimize their capabilities in long-context and real-world scenarios.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.