Wolfram Language

Classify an Audio Dataset Using Transfer Learning

Sometimes the amount of data available to train a network is insufficient for the task at hand. Transfer learning is an possible solution to this problem. Instead of training a network from scratch, it is possible to use as a starting point a net that has already been trained on a different but related task.

Download the ESC-50 dataset.

show complete Wolfram Language input

Import the metadata. The dataset is a labeled collection of 2000 environmental audio recordings. The files are five-second-long recordings organized into 50 semantic classes.

show complete Wolfram Language input

Inspect a sample from the metadata.

Divide the dataset into training and testing subsets.

Take a look at the available classes.

Construct a feature extractor net by chopping the classifier layers in the AudioIdentify network.

Construct a simple linear classifier network that will be attached to the feature extractor.

Instead of retraining the full net and specifying a LearningRateMultipliers option in NetTrain to train only the classification layers, you can precompute the results of the feature extractor net and train the classifier. This avoids redundant evaluation of the full net.

Train the classifier network using NetTrain.

Join the feature extractor network and the trained classifier using NetJoin.

Using ClassifierMeasurements, compute the accuracy on the test data and plot the confusion matrix of the worst four classes.

Related Examples

de es fr ja ko pt-br zh