Wolfram Language

Inspect Speech Using a Neural Net

This example recognizes speech using the built-in SpeechRecognize function. A neural net from the Wolfram Neural Net Repository is also used to transcribe the signal into a list of characters.

Start from a synthesized speech signal.

Recognize the speech in the audio object.

Now, get the pre-trained speech recognition net.

Evaluate the neural net on the following synthesized speech. The network returns a list of characters recognized from the audio recording.

Join characters to get a preliminary version of the recognized speech.

The network was trained using a CTC loss to map the list of frames from the input to a list of characters, taking into account the fact that a single letter can span multiple frames.

You can visualize the output of the network just before the CTC decoding to get the probabilities of all letters at any point in time. The bottom axis is labeled with an intermediate decoding obtained by taking the character with the maximum probability at each frame.

Related Examples

de es fr ja pt-br zh