Whisper Large v3
OpenAI
OpenAI's state-of-the-art automatic speech recognition (ASR) model with 1.55B parameters supporting 99+ languages. Trained on 1M hours of weakly labeled audio plus 4M hours of pseudo-labeled audio from Whisper large-v2. Uses 128 Mel frequency bins (vs 80 in previous versions) and trained for 2.0 epochs. Achieves 7.4% WER average and 97.9% word accuracy on LibriSpeech. Shows 10-20% error reduction vs large-v2 and 72% WER reduction vs prior MLPerf ASR model (RNN-T). Performs automatic language identification, generates phrase-level timestamps, and handles punctuation/capitalization. Strongest on high-resource languages (English, Spanish, French, German). Open source enables self-hosting and commercial use.
Strengths
Caveats
Capabilities
Resources
No external resources available