Dion- Distributed Orthonormalized Updates

06/01/2026 18 min

Listen "Dion- Distributed Orthonormalized Updates"

Episode Synopsis

In this episode:• The GPU Bill Blues: Professor Norris laments the exorbitant cost of training large models, setting the stage for Linda to introduce the episode's focus: 'Dion: Distributed Orthonormalized Updates' by researchers from Microsoft and Harvard.• Muon's Heavy Lifting: Linda explains the predecessor, the Muon optimizer, and its orthonormalization benefits. Norris questions why a new method is needed, leading to a discussion on how Newton-Schulz iterations become a communication bottleneck in sharded distributed training.• Rethinking Linear Algebra: Linda details Dion's core innovation: replacing full matrix reconstruction with amortized power iteration on a momentum buffer. Norris is skeptical about the math, but Linda explains how this integrates cleanly with weight sharding.• The Magic of Error Feedback: The hosts discuss the 'rank-fraction' parameter and how low-rank updates save compute. Linda explains the crucial role of 'error feedback' in maintaining accuracy, finally winning over Norris's skepticism.• Lazy Updates and CPU Offloading: A look at the algorithmic flexibility of Dion, including 'Lazy-Dion' and CPU offloading variants. They discuss the experimental results showing Dion matching Muon's performance with significantly lower wall-clock time.• Future-Proofing Optimization: Professor Norris admits the elegance of the solution. The pair wraps up with thoughts on how Dion might become the standard for training next-generation foundation models.

More episodes of the podcast Mechanical Dreams

Engram Paper 12/01/2026

From Entropy to Epiplexity- Rethinking Information for Computationally Bounded Intelligence 09/01/2026

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration 08/01/2026

NorMuon- Making Muon more efficient and scalable 07/01/2026

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining 05/01/2026

Latent State Models of Training Dynamics 28/10/2025

DeepSeek OCR 24/10/2025

The Coverage Principle - How Pre-training Enables Post-Training 23/10/2025

Continual Learning via Sparse Memory Finetuning 22/10/2025

Untitled Episode 10/10/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Dion- Distributed Orthonormalized Updates

Listen "Dion- Distributed Orthonormalized Updates"

Episode Synopsis

More episodes of the podcast Mechanical Dreams

Information Technology (IT)

Googling with breathtaking tricks you ignore

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD