Automatic speech recognition (ASR), also known as speech-to-text (STT), is the process of automatically recognizing and converting spoken recordings into text. Speech recognition is heavily used in large-scale automatic transcription systems, virtual and home assistants, voice-enabled control systems, dictation systems, automated phone systems and more.
Version 12 introduces SpeechRecognize to perform automatic speech recognition.
Here is a speech signal found on the web.
Visualize the spectrogram of the signal.
And here is the result of speech recognition on that signal. The process of speech recognition is a pass to use a neural network to compute a raw transcription of the signal followed by sending the transcription through a language model to correct for spelling errors and more.