Wolfram Language

Knowledgebase Expansion

Examine Characteristics of Languages, Alphabets, and Scripts

Version 11 provides access to extensive built-in knowledge about languages, writing scripts, and alphabets.

Different languages may share the same writing script (or writing system) but still use different alphabets of characters. This example explores the large variability in numbers of characters in the languages using the Latin writing script.

Take the list of alphabets that use the Latin writing script.

In[1]:=
Click for copyable input
alphabets = EntityList[ EntityClass["Alphabet", "WritingScripts" -> Entity["WritingScript", "Latin::6tr5q"]]];
In[2]:=
Click for copyable input
Length[alphabets]
Out[2]=

There are 131 such alphabets. Show a small sample of them.

In[3]:=
Click for copyable input
RandomSample[alphabets, 15]
Out[3]=

Construct an association storing the list of characters of each alphabet.

In[4]:=
Click for copyable input
letters = EntityValue[alphabets, "CommonAlphabet", "EntityAssociation"];

The shortest alphabet is Mohawk, with 12 letters.

In[5]:=
Click for copyable input
letters[Entity["Alphabet", "Mohawk::p8wq4"]]
Out[5]=

The longest alphabet is Slovak, with 46 characters.

In[6]:=
Click for copyable input
letters[Entity["Alphabet", "Slovak::kj62d"]]
Out[6]=

This histogram shows that the most common length is 26 letters, like English, though not all 26-letter alphabets contain the same letters.

In[7]:=
Click for copyable input
Histogram[Length /@ letters, 30]
Out[7]=

Now count the number of alphabets in which a given letter is present. Only three letters are present in all 131 Latin alphabets, namely a, i, n.

In[8]:=
Click for copyable input
TakeLargest[Counts[Flatten[Values[letters]]], 10]
Out[8]=

Mohawk does not contain the letter m, and the Hawaiian alphabet is the only one not containing t.

In[9]:=
Click for copyable input
letters[Entity["Alphabet", "Hawaiian::p38r5"]]
Out[9]=

Related Examples

de es fr ja ko pt-br ru zh