Jointly Training Large Autoregressive Multimodal Models

28/09/2023 25 min

Listen "Jointly Training Large Autoregressive Multimodal Models"

Episode Synopsis

The paper introduces the Joint Autoregressive Mixture (JAM) framework, which combines text and image generation models to create high-quality multimodal outputs. It also presents a data-efficient instruction-tuning strategy for mixed-modal generation tasks.

https://arxiv.org/abs//2309.15564

YouTube: https://www.youtube.com/@ArxivPapers

PODCASTS:
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

More episodes of the podcast Arxiv Papers