Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO

10/11/2023 9 min

Listen "Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO"

Episode Synopsis

The paper explores the use of speculative sampling to reduce latency in text generation, comparing it to autoregressive sampling. The authors also discuss the use of model-based optimizations and provide a Jupyter notebook and sample executions.

https://arxiv.org/abs//2311.04951

YouTube: https://www.youtube.com/@ArxivPapers

TikTok: https://www.tiktok.com/@arxiv_papers

Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016

Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

More episodes of the podcast Arxiv Papers