DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

08/08/2025 17 min

Listen "DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis"

Descargar episodio Ver en sitio original

Episode Synopsis

This reviews a document dated January 27, 2025, from Daniel and Michael at Unsloth, details their work on quantizing DeepSeek-R1's 671B parameter model, significantly reducing its size by 80% to 131GB while maintaining functionality. They achieved this dynamic quantization by selectively applying higher bitrates to crucial layers and lower bitrates to less sensitive MoE layers, contrasting with naive quantization methods that render the model unusable. The text explains how to run these quantized versions, discussing hardware requirements, performance benchmarks, and chat template considerations. It also offers a guide for local execution on various systems, including specific instructions for GPU and Apple devices, and outlines the use of Ollama/Open WebUISource: https://unsloth.ai/blog/deepseekr1-dynamic

More episodes of the podcast AI: post transformers

Scaling laws: long context length and in context learning 17/01/2026

DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup 14/01/2026

PageANN: Scalable Disk ANNS with Page-Aligned Graphs 07/12/2025

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values 04/12/2025

NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free 29/11/2025

NeurIPS 2025: Large Language Diffusion Models 29/11/2025

NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example 29/11/2025

NeurIPS 2025: Parallel Scaling Law for Language Models 29/11/2025

NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data 29/11/2025

NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces 29/11/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

Listen "DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis"

Episode Synopsis

More episodes of the podcast AI: post transformers

Free Internet, a prediction in Nostradamus style

Email on your own domain, luxury or need?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD