Wolfram Archive
Wolfram Programming Lab is a legacy product.
All the same functionality and features, including access to Programming Lab Explorations, are available with Wolfram|One.
Start programming now. »
Try it now »
(no sign-in required)

Find the Most Common Word in the Gettysburg Address

What word occurs most often in the Gettysburg Address?

Run the code to get the text of the Gettysburg Address. Try getting other texts, like "AliceInWonderland" or "ToBeOrNotToBe":

SHOW/HIDE DETAILS

The Wolfram Language has a wealth of built-in examples that are handy for experimenting and testing. This gives a list of the kinds of examples that are available:

ExampleData[]

Give the name of a category to see the examples of that type:

ExampleData["Text"]

Give the specific name to get an example:

ExampleData[{"Text", "GettysburgAddress"}]

HIDE DETAILS
text = ExampleData[{"Text", "GettysburgAddress"}]

Split the text into individual lowercase words:

Note: run the code in the previous step first.

SHOW/HIDE DETAILS

This splits the Gettysburg Address text into words:

TextCases[text, "Word"]

Make all of the words lowercase so that, for example, The and the both appear as the in the list:

ToLowerCase[TextCases[text, "Word"]]

HIDE DETAILS
words = ToLowerCase[TextCases[text, "Word"]]

Find the most common word:

Note: run the code in the previous step first.

SHOW/HIDE DETAILS

Commonest gives the most common element in a list. That is the most common word in the Gettysburg Address:

Commonest[words]

HIDE DETAILS
Commonest[words]

Find the most common significant word:

SHOW/HIDE DETAILS

A stopword is a commonly used word like that or the that doesnt reveal much about the content of a text.

Use DeleteStopwords to remove insignificant words from a text:

DeleteStopwords[words]

Find the most common significant word:

Commonest[DeleteStopwords[words]]

HIDE DETAILS
Commonest[DeleteStopwords[words]]

Find the three most common significant words. Try other numbers of words:

Commonest[DeleteStopwords[words], 3]