Listen "Whisper: Robust Speech Recognition via Large-Scale Weak Supervision"
Episode Synopsis
This research paper introduces Whisper, a speech recognition system trained on a massive, weakly supervised dataset of 680,000 hours of audio. The paper argues that scaling weakly supervised training has been underappreciated in speech recognition and that Whisper's robust, zero-shot performance demonstrates its ability to generalize well across different domains, languages, and tasks, even surpassing human accuracy in some areas. The authors explore the system's scaling properties, both in terms of model size and dataset size and analyze the impact of multitasking and multilingual training. They also discuss Whisper's performance on language identification and its robustness to noise. The paper concludes with a discussion of potential limitations and areas for future work.
More episodes of the podcast Artificial Discourse
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
19/11/2024
A Survey of Small Language Models
12/11/2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
11/11/2024
The Llama 3 Herd of Models
10/11/2024
Kolmogorov-Arnold Network (KAN)
09/11/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.