Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts

09/04/2024 51 min

Listen "Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts"

Descargar episodio Ver en sitio original

Episode Synopsis

Episode22: How small LLMs (47B) are outperforming GPT3 (185B) using a Mixture of Experts (MoE)
AI News:

2402.05120 More Agents Is All You Need

2403.16971 AIOS: LLM Agent Operating System

2404.02258 Mixture-of-Depths

Devika GitHub Repository - Devika: An Agentic AI Software Engineer

T-Rex GitHub Repository - T-Rex: A Large-Scale Relation Extraction Framework

WSJ Article on Cognition Labs - A Peter Thiel-backed AI startup, Cognition Labs, seeks $2 billion in valuation

References for main topic:

1701.06538 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts

2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

2101.03961 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

2112.06905 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

2202.08906 ST-MoE: Designing Stable and Transferable Sparse Expert Models

2211.15841 MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

2401.04088 Mixtral of Experts

1511.07543 Convergent Learning: Do different neural networks learn the same representations?

More episodes of the podcast Machine Learning Made Simple

Ep74: The AI Revolution Isn’t in Chatbots—It’s in Thermostats 13/05/2025

Ep73: Deception Emerged in AI: Why It’s Almost Impossible to Detect 06/05/2025

Ep72: Can We Trust AI to Regulate AI? 22/04/2025

Ep71: The AI Detection Crisis: Why Real Content Gets Flagged 15/04/2025

Ep70: Content Moderation at Scale: Why GPT-4 Isn’t Enough | Aegis vs. the Rest 08/04/2025

Ep69: MCP, GPT-4 Image Editing, and the Future of AI Tool Integration 01/04/2025

Ep68: Is GPT-4.5 Already Outdated? 25/03/2025

Ep67: Why RAG Fails LLMs – And How to Finally Fix It 19/03/2025

Ep66: Fastest LLM Ever? Diffusion AI is Changing Everything 11/03/2025

Episode 65: The AI Takeover Has Already Begun – Here’s What You Need to Know 04/03/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts

Listen "Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts"

Episode Synopsis

More episodes of the podcast Machine Learning Made Simple

Email on your own domain, luxury or need?

Preparing for a Hacker Threat

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD