Wolfram Language

Data Augmentation in Net Encoders

The built-in audio NetEncoders can perform a variety of data augmentations online prior to computing features. Data augmentation is useful for increasing the effective size of a dataset to make trained models more robust against overfitting or to add invariance to some specific aspects of the data.

Create a test signal and plot it.

Create an encoder that trims each training example by an amount randomly sampled from a uniform distribution between ± . This can help make a model more invariant to the locality of an audio event.

Create an encoder that randomly adjusts the amplitude of its inputs by multiplying each example by a constant factor randomly sampled from a uniform distribution.

Create an encoder that adds noise from a specified model (here a sine wave) to each example, with a noise level sampled from a uniform distribution between 0 and 0.1.

Create an encoder that convolves each training example with another signal, using a randomly sampled mixture level. This is useful for simulating the effects of different recording environments, e.g. by adding reverberation.

When using either of the "AudioMelSpectrogram" or "AudioMFCC" encoders, the spacing of the centers of the filter banks used to summarize the power spectrum can be randomly warped in order to simulate the effect of different vocal tract lengths in human speech production. Create an encoder using the "VTLP" augmentation, where the warp factor is randomly sampled from a uniform distribution between .5 and 2.

Related Examples

de es fr ja ko pt-br zh