Latest episodes of the podcast Mechanical Dreams
Mostrando página 5 de 5
Understanding WSD Learning Rates
18/11/2024
The Road Less Scheduled
01/11/2024
Learning-Rate-Free Learning by D-Adaptation
31/10/2024
Scaling FP8 Training to Trillion Token LLMs
30/10/2024
A Survey on Model MoErging
28/10/2024
Liquid Time-constant Networks
27/10/2024
A Spectral Condition for Feature Learning
25/10/2024
Don't decay the learning rate
24/10/2024
OLMoE
23/10/2024
An Empirical Model of Large Batch Training
22/10/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.