Listen "GPT-2"
Episode Synopsis
GPT-2 language model is a large, transformer-based model using a decoder-only architecture. It predicts the next word in a sequence, much like an advanced keyboard app. GPT-2 is auto-regressive, adding each predicted token to the input for the next step. It uses masked self-attention, focusing on previous tokens, unlike BERT's self-attention. Input tokens are processed through multiple decoder blocks, each having self-attention and neural network layers. The self-attention mechanism uses query, key, and value vectors for context. GPT-2 has applications in machine translation, summarization, and music generation.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.