Wolfram Language

Computational Audio

Use DTW to Compare Recordings

Import, trim, and preprocess four recordings of the first sentence of Alice in Wonderland.

show complete Wolfram Language input
In[1]:=
Click for copyable input
urls = {"http://ia800503.us.archive.org/3/items/alices_adventures/\ aliceinwonderland_01_carroll.mp3", "http://ia800306.us.archive.org/25/items/alice_wonderland_0711_\ librivox/alice_01_carroll.mp3", "http://ia800201.us.archive.org/32/items/alices_adventures_1003/\ alices_adventures_01_carroll.mp3", "https://ia800904.us.archive.org/15/items/alicesadventure_abridged_\ pc_librivox/alicesadventuresinwonderlandabridged_01_carroll.mp3"}; times = {{27, 33.5}, {16.5, 25}, {22.3, 28.5}, {31, 38}};
In[2]:=
Click for copyable input
alice = ConformAudio[ MapThread[ AudioNormalize[ AudioChannelMix[AudioTrim[AudioResample[Import[#1], 11025], #2], 1]] &, {urls, times}]]
Out[2]=

Show the plots for the signals.

In[3]:=
Click for copyable input
AudioPlot[alice, ImageSize -> Medium]
Out[3]=

Compute and plot the MFCC features for the samples.

In[4]:=
Click for copyable input
mfcc = AudioLocalMeasurements[#, "MFCC", PartitionGranularity -> {.05, .01}]["Values"] & /@ alice;
In[5]:=
Click for copyable input
Column[MatrixPlot[#, PlotTheme -> "Minimal", ImageSize -> Medium] & /@ Transpose /@ mfcc]
Out[5]=

Compute the dynamic time warping distance between the recordings using WarpingDistance.

In[6]:=
Click for copyable input
DistanceMatrix[mfcc, DistanceFunction -> WarpingDistance] // MatrixPlot
Out[6]=

Compute the dynamic time warping correspondence between two of the recordings using WarpingCorrespondence.

In[7]:=
Click for copyable input
{n, m} = WarpingCorrespondence[mfcc[[1]], mfcc[[2]]];
show complete Wolfram Language input
In[8]:=
Click for copyable input
dur = QuantityMagnitude[Duration[alice[[1]]], "s"]; s = {n, m}\[Transpose]/Max[{n, m}] dur; Labeled[ ListLinePlot[ s, PlotRange -> {{0, dur}, {0, dur}}, AspectRatio -> 1, Axes -> False, PlotStyle -> Thickness[.01], ImageSize -> Medium, Frame -> True, FrameTicks -> None, Prolog -> {RGBColor[ 0.6666666666666666, 0.6666666666666666, 0.6666666666666666], {Line[{{#[[1]], 0}, #}], Line[{{0, #[[2]]}, #}]} & /@ (s[[;; ;; 100]])} ], AudioPlot[#, PlotStyle -> RGBColor[0.560181, 0.691569, 0.194885], Frame -> False, Axes -> False, ImageSize -> Medium, AspectRatio -> 1/15] & /@ (alice[[;; 2]]), {Bottom, Left}, RotateLabel -> True, Spacings -> {0, 0}]
Out[8]=

Related Examples

de es fr ja ko pt-br ru zh