Listen "Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts"
Episode Synopsis
Episode22: How small LLMs (47B) are outperforming GPT3 (185B) using a Mixture of Experts (MoE)
AI News:
2402.05120 More Agents Is All You Need
2403.16971 AIOS: LLM Agent Operating System
2404.02258 Mixture-of-Depths
Devika GitHub Repository - Devika: An Agentic AI Software Engineer
T-Rex GitHub Repository - T-Rex: A Large-Scale Relation Extraction Framework
WSJ Article on Cognition Labs - A Peter Thiel-backed AI startup, Cognition Labs, seeks $2 billion in valuation
References for main topic:
1701.06538 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts
2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
2101.03961 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2112.06905 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2202.08906 ST-MoE: Designing Stable and Transferable Sparse Expert Models
2211.15841 MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
2401.04088 Mixtral of Experts
1511.07543 Convergent Learning: Do different neural networks learn the same representations?
AI News:
2402.05120 More Agents Is All You Need
2403.16971 AIOS: LLM Agent Operating System
2404.02258 Mixture-of-Depths
Devika GitHub Repository - Devika: An Agentic AI Software Engineer
T-Rex GitHub Repository - T-Rex: A Large-Scale Relation Extraction Framework
WSJ Article on Cognition Labs - A Peter Thiel-backed AI startup, Cognition Labs, seeks $2 billion in valuation
References for main topic:
1701.06538 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts
2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
2101.03961 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2112.06905 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
2202.08906 ST-MoE: Designing Stable and Transferable Sparse Expert Models
2211.15841 MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
2401.04088 Mixtral of Experts
1511.07543 Convergent Learning: Do different neural networks learn the same representations?
More episodes of the podcast Machine Learning Made Simple
Ep72: Can We Trust AI to Regulate AI?
22/04/2025
Ep68: Is GPT-4.5 Already Outdated?
25/03/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.