Wolfram Language

Text & Language Processing

Computation with Multilingual Word Lists

Compare the distribution of numbers of characters per word in different languages.

In[1]:=
Click for copyable input
languages = {"German", "English", "Italian", "Dutch", "Russian"};

Get the available lists of words for those languages and collect them in an association.

In[2]:=
Click for copyable input
words = Association[# -> WordList[Language -> #] & /@ languages];

Compute the lengths of each of those words.

In[3]:=
Click for copyable input
wordLengths = StringLength /@ words;

These are the minimum and maximum lengths.

In[4]:=
Click for copyable input
MinMax /@ wordLengths
Out[4]=

Show overlapped histograms of relative frequencies in each language. Russian and English have a higher fraction of shorter words, while Dutch and German have a clear tail of longer words.

In[5]:=
Click for copyable input
Histogram[wordLengths, Automatic, "PDF", ChartLegends -> Automatic]
Out[5]=

Combine the histograms to show total counts of lengths for all languages together.

In[6]:=
Click for copyable input
Histogram[wordLengths, ChartLegends -> Automatic, ChartLayout -> "Stacked"]
Out[6]=

Related Examples

de es fr ja ko pt-br ru zh