Listen "Models tell you what to discard"
Episode Synopsis
This paper introduces FastGen, a novel method that uses lightweight model profiling and adaptive key-value caching to significantly reduce memory footprint without noticeable quality loss.
Read full paper: https://arxiv.org/abs/2310.01801
Tags: Systems and Performance, Machine Learning, Optimization
More episodes of the podcast Byte Sized Breakthroughs
Zero Bubble Pipeline Parallelism
08/07/2024
The limits to learning a diffusion model
08/07/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.