Online Speculative Decoding

12/10/2023 25 min

Listen "Online Speculative Decoding"

Episode Synopsis

Online speculative decoding is introduced as a technique to improve the efficacy of speculative decoding in large language models. By continually updating draft models using excess computational power, the draft models can more accurately predict the target model's outputs, resulting in reduced latency.

https://arxiv.org/abs//2310.07177

YouTube: https://www.youtube.com/@ArxivPapers

TikTok: https://www.tiktok.com/@arxiv_papers

Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016

Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

More episodes of the podcast Arxiv Papers