Listen "Qwen-1"
Episode Synopsis
Qwen-1, also known as QWEN, is a series of large language models that includes base pretrained models, chat models, and specialized models for coding and math. These models are trained on a massive dataset of 3 trillion tokens using byte pair encoding for tokenization, and they feature a modified Transformer architecture with untied embeddings and rotary positional embeddings. The chat models (QWEN-CHAT) are aligned to human preferences using Supervised Finetuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). QWEN models have strong performance, outperforming many open-source models, but they generally lag behind models like GPT-4.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.