Episode 60: DeepSeek Models Explained Part I

28/01/2025 36 min Temporada 2 Episodio 60

Listen "Episode 60: DeepSeek Models Explained Part I"

Episode Synopsis

What if AI could match enterprise-grade performance at a fraction of the cost? In this episode, we dive deep into DeepSeek, the groundbreaking open-source models challenging tech giants with 95% lower costs. From innovative training optimizations to revolutionary data curation, discover how a resource-constrained startup is redefining what's possible in AI.
🎯 Episode Highlights:

Beyond cost-cutting: How DeepSeek matches top-tier AI performance

Game-changing memory optimization and pipeline parallelization

Inside the technology: Zero-redundancy training and dependency parsing

The future of efficient, accessible AI development


Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed!

References for main topic:

[2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

[2412.19437] DeepSeek-V3 Technical Report

https://arxiv.org/abs/2501.12948

https://www.deepspeed.ai/2021/03/07/zero3-offload.html

[1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

[2205.05198] Reducing Activation Recomputation in Large Transformer Models

[2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training



More episodes of the podcast Machine Learning Made Simple