Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts

09/04/2024 51 min

Listen "Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts"

Episode Synopsis

Episode22: How small LLMs (47B) are outperforming GPT3 (185B) using a Mixture of Experts (MoE)
AI News:

2402.05120 More Agents Is All You Need

2403.16971 AIOS: LLM Agent Operating System

2404.02258 Mixture-of-Depths

Devika GitHub Repository - Devika: An Agentic AI Software Engineer

T-Rex GitHub Repository - T-Rex: A Large-Scale Relation Extraction Framework

WSJ Article on Cognition Labs - A Peter Thiel-backed AI startup, Cognition Labs, seeks $2 billion in valuation


References for main topic:

1701.06538 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts

2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

2101.03961 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

2112.06905 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

2202.08906 ST-MoE: Designing Stable and Transferable Sparse Expert Models

2211.15841 MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

2401.04088 Mixtral of Experts

1511.07543 Convergent Learning: Do different neural networks learn the same representations?



More episodes of the podcast Machine Learning Made Simple