Wolfram Language

Net Encoders for Audio

A variety of audio-specific NetEncoder objects are now available to help solidly integrate the Audio object with the neural net framework. The encoders are a key part of the framework, since they provide an easy way to inject data into a neural net.

Inspect the features from each encoder computed on a recording of a bird.

The "Audio" net encoder simply returns the waveform after a resampling and downmixing step.

show complete Wolfram Language input

The "AudioSTFT" net encoder computes the Fourier transform on partitions of the input signal. This feature contains both time and frequency information.

show complete Wolfram Language input

The "AudioSpectrogram" net encoder returns the power spectrum computed on partitions of the input signal.

show complete Wolfram Language input

The "AudioMelSpectrogram" net encoder returns a spectrogram that has been filtered so that the frequency bins are nonlinearly spaced to mimic the pitch perception in humans.

show complete Wolfram Language input

The "AudioMFCC" net encoder performs some further dimensionality reduction on the mel spectrogram, while preserving most of the information contained in the signal.

show complete Wolfram Language input

Related Examples

de es fr ja ko pt-br zh