Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

04/10/2024 10 min Temporada 2 Episodio 3

Listen "Whisper: Robust Speech Recognition via Large-Scale Weak Supervision"

Episode Synopsis

This research paper introduces Whisper, a speech recognition system trained on a massive, weakly supervised dataset of 680,000 hours of audio. The paper argues that scaling weakly supervised training has been underappreciated in speech recognition and that Whisper's robust, zero-shot performance demonstrates its ability to generalize well across different domains, languages, and tasks, even surpassing human accuracy in some areas. The authors explore the system's scaling properties, both in terms of model size and dataset size and analyze the impact of multitasking and multilingual training. They also discuss Whisper's performance on language identification and its robustness to noise. The paper concludes with a discussion of potential limitations and areas for future work.

More episodes of the podcast Artificial Discourse