Inspect Speech Using a Neural Net
Start from a synthesized speech signal.
Recognize the speech in the audio object.
Now, get the pre-trained speech recognition net.
Evaluate the neural net on the following synthesized speech. The network returns a list of characters recognized from the audio recording.
Join characters to get a preliminary version of the recognized speech.
The network was trained using a CTC loss to map the list of frames from the input to a list of characters, taking into account the fact that a single letter can span multiple frames.
You can visualize the output of the network just before the CTC decoding to get the probabilities of all letters at any point in time. The bottom axis is labeled with an intermediate decoding obtained by taking the character with the maximum probability at each frame.