Listen "Model/Knowledge Distillation"
Episode Synopsis
Model/Knowledge distillation is a technique to transfer knowledge from a cumbersome model, like a large neural network or an ensemble of models, to a smaller, more efficient model. The smaller model is trained using "soft targets," which are the class probabilities produced by the larger model, rather than the usual "hard targets" of correct class labels. These soft targets contain more information, including how the cumbersome model generalizes and the similarity structure of the data. A temperature parameter is used to soften the probability distributions, making the information more accessible for the smaller model to learn. This process improves the smaller model's generalization ability and efficiency. Distillation allows the smaller model to achieve performance comparable to the larger model with less computation.
More episodes of the podcast Large Language Model (LLM) Talk
Kimi K2
22/07/2025
Mixture-of-Recursions (MoR)
18/07/2025
MeanFlow
10/07/2025
Mamba
10/07/2025
LLM Alignment
14/06/2025
Why We Think
20/05/2025
Deep Research
12/05/2025
vLLM
04/05/2025
Qwen3: Thinking Deeper, Acting Faster
04/05/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.