Learn more about Whisper

Whisper is an advanced speech recognition (ASR) system that combines a large and diverse amount of multilingual and multitasking data to achieve greater robustness in speech transcription and translation. With 680,000 hours of data collected from the web, Whisper has proven to be more effective at transcribing accents and ambient noise compared to other existing approaches. Furthermore, its end-to-end architecture implemented as an encoder-decoder transformer enables easy integration into practical applications and future research in robust speech processing.

This system outperforms existing models in terms of accuracy and robustness, reducing errors by 50% compared to models specialized in speech recognition. Whisper is also effective in non-English speech-to-text translation and has outperformed supervised systems in CoVoST2 in zero-shot translation into English.

We expect Whisper to be a valuable solution for developers and enable the integration of voice interfaces into a wide variety of applications. Whisper's high accuracy and ease of use make it an indispensable tool in the evolution of speech recognition and translation.

