Qwen-1

01/02/2025 14 min

Listen "Qwen-1"

Descargar episodio Ver en sitio original

Episode Synopsis

Qwen-1, also known as QWEN, is a series of large language models that includes base pretrained models, chat models, and specialized models for coding and math. These models are trained on a massive dataset of 3 trillion tokens using byte pair encoding for tokenization, and they feature a modified Transformer architecture with untied embeddings and rotary positional embeddings. The chat models (QWEN-CHAT) are aligned to human preferences using Supervised Finetuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). QWEN models have strong performance, outperforming many open-source models, but they generally lag behind models like GPT-4.

More episodes of the podcast Large Language Model (LLM) Talk

Kimi K2 22/07/2025

Mixture-of-Recursions (MoR) 18/07/2025

MeanFlow 10/07/2025

Mamba 10/07/2025

LLM Alignment 14/06/2025

Why We Think 20/05/2025

Deep Research 12/05/2025

vLLM 04/05/2025

Qwen3: Thinking Deeper, Acting Faster 04/05/2025

RAGEN: train and evaluate LLM agents using multi-turn RL 03/05/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Qwen-1

Listen "Qwen-1"

Episode Synopsis

More episodes of the podcast Large Language Model (LLM) Talk

Gray Hat Hacking, those with ambiguous ethics…

Preparing for a Hacker Threat

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD