Wolfram Language

Recognize Speech

Automatic speech recognition (ASR), also known as speech-to-text (STT), is the process of automatically recognizing and converting spoken recordings into text. Speech recognition is heavily used in large-scale automatic transcription systems, virtual and home assistants, voice-enabled control systems, dictation systems, automated phone systems and more.

Version 12 introduces SpeechRecognize to perform automatic speech recognition.

Here is a speech signal found on the web.

Visualize the spectrogram of the signal.

And here is the result of speech recognition on that signal. The process of speech recognition is a pass to use a neural network to compute a raw transcription of the signal followed by sending the transcription through a language model to correct for spelling errors and more.

Related Examples

de es fr ja ko pt-br zh