Explore the latest version of An Elementary Introduction to the Wolfram Language »
 22 Machine Learning
Well talk about how to train the language yourself. But first lets look at some built-in functions that have already been trained on huge numbers of examples.
LanguageIdentify takes pieces of text, and identifies what human language theyre in.
 In[1]:=
 Out[1]=
 In[2]:=
 Out[2]=
Theres a general function Classify, which has been taught various kinds of classification. One example is classifying the sentiment of text.
 In[3]:=
 Out[3]=
Downbeat text is classified as having negative sentiment:
 In[4]:=
 Out[4]=
You can also train Classify yourself. Heres a simple example of classifying handwritten digits as 0 or 1. You give Classify a collection of training examples, followed by a particular handwritten digit. Then itll tell you whether the digit you give is a 0 or 1.
With training examples, Classify correctly identifies a handwritten 0:
 In[5]:=
 Out[5]=
Find what number in the list is nearest to 22:
 In[6]:=
 Out[6]=
Find the nearest three numbers:
 In[7]:=
 Out[7]=
Nearest can find nearest colors as well.
Find the 3 colors in the list that are nearest to the color you give:
 In[8]:=
 Out[8]=
 In[9]:=
 Out[9]=
 In[10]:=
 Out[10]=
TextRecognize can still recognize the original text string in this.
Recognize text in the image:
 In[11]:=
 Out[11]=
If the text gets too blurred TextRecognize cant tell what it saysand you probably cant either.
Generate a sequence of progressively more blurred pieces of text:
 In[12]:=
 Out[12]=
As the text gets more blurred, TextRecognize makes a mistake, then gives up altogether:
 In[13]:=
 Out[13]=
Something similar happens if we progressively blur the picture of a cheetah. When the picture is still fairly sharp, ImageIdentify will correctly identify it as a cheetah. But when it gets too blurred ImageIdentify starts thinking its more likely to be a lion, and eventually the best guess is that its a picture of a person.
Progressively blur a picture of a cheetah:
 In[14]:=
 Out[14]=
When the picture gets too blurred, ImageIdentify no longer thinks its a cheetah:
 In[15]:=
 Out[15]=
ImageIdentify normally just gives what it thinks is the most likely identification. You can tell it, though, to give a list of possible identifications, starting from the most likely. Here are the top 10 possible identifications, in all categories.
ImageIdentify thinks this might be a cheetah, but its more likely to be a lion, or it could be a dog:
 In[16]:=
 Out[16]=
When the image is sufficiently blurred, ImageIdentify can have wild ideas about what it might be:
 In[17]:=
 Out[17]=
In machine learning, one often gives training that explicitly says, for example, this is a cheetah, this is a lion. But one also often just wants to automatically pick out categories of things without any specific training.
Collect clusters of similar colors into separate lists:
 In[18]:=
 Out[18]=
 In[19]:=
 Out[19]=
Show nearby colors successively grouped together:
 In[20]:=
 Out[20]=
In the Wolfram Language, FeatureSpacePlot takes collections of objects and tries to find what it considers the best distinguishing features of them, then uses the values of these to position objects in a plot.
FeatureSpacePlot doesnt explicitly say what features its usingand actually theyre usually quite hard to describe. But what happens in the end is that FeatureSpacePlot arranges things so that objects that have similar features are drawn nearby.
FeatureSpacePlot makes similar colors be placed nearby:
 In[21]:=
 Out[21]=
If one uses, say, 100 colors picked completely at random, then FeatureSpacePlot will again place colors it considers similar nearby.
100 random colors laid out by FeatureSpacePlot:
 In[22]:=
 Out[22]=
Lets try the same kind of thing with images of letters.
Make a rasterized image of each letter in the alphabet:
 In[23]:=
 Out[23]=
FeatureSpacePlot will use visual features of these images to lay them out. The result is that letters that look similarlike y and v or e and cwill wind up nearby.
 In[24]:=
 Out[24]=
FeatureSpacePlot places photographs of different kinds of things quite far apart:
 In[25]:=
 Out[25]=
 LanguageIdentify[text] identify what human language text is in ImageIdentify[image] identify what an image is of TextRecognize[text] recognize text from an image (OCR) Classify[training,data] classify data on the basis of training examples Nearest[list,item] find what element of list is nearest to item FindClusters[list] find clusters of similar items NearestNeighborGraph[list,n] connect elements of list to their n nearest neighbors Dendrogram[list] make a hierarchical tree of relations between items FeatureSpacePlot[list] plot elements of list in an inferred “feature space”
22.1Identify what language the word ajatella comes from. »
Expected output:
 Out[]=
22.2Apply ImageIdentify to an image of a tiger, getting the image using ctrl+=»
Expected output:
 Out[]=
22.3Make a table of image identifications for an image of a tiger, blurred by an amount from 1 to 5. »
Expected output:
 Out[]=
22.4Classify the sentiment of Im so happy to be here»
Expected output:
 Out[]=
22.5Find the 10 words in WordList[ ] that are nearest to happy»
Sample expected output:
 Out[]=
22.6Generate 20 random numbers up to 1000 and find which 3 are nearest to 100. »
Sample expected output:
 Out[]=
22.7Generate a list of 10 random colors, and find which 5 are closest to Red»
Sample expected output:
 Out[]=
22.8Of the first 100 squares, find the one nearest to 2000. »
Expected output:
 Out[]=
22.9Find the 3 European flags nearest to the flag of Brazil. »
Expected output:
 Out[]=
22.10Make a graph of the 2 nearest neighbors of each color in Table[Hue[h], {h, 0, 1, .05}]»
Expected output:
 Out[]=
22.11Generate a list of 100 random numbers from 0 to 100, and make a graph of the 2 nearest neighbors of each one. »
Sample expected output:
 Out[]=
22.12Collect the flags of Asia into clusters of similar flags. »
Expected output:
 Out[]=
22.13Make raster images of the letters of the alphabet at size 20, then make a graph of the 2 nearest neighbors of each one. »
Expected output:
 Out[]=
22.14Generate a table of the results of using TextRecognize on hello rasterized at size 50 and then blurred by between 1 and 10. »
Expected output:
 Out[]=
22.15Make a dendrogram for images of the first 10 letters of the alphabet. »
Expected output:
 Out[]=
Expected output:
 Out[]=
+22.1Make a table of image identifications for a picture of the Eiffel Tower, blurred by an amount from 1 to 5. »
Expected output:
 Out[]=
+22.2Classify the sentiment of the Wikipedia article on happiness»
Expected output:
 Out[]=
+22.3Of colors in the list Table[Hue[h], {h, 0, 1, .05}], find which are the 3 nearest to Pink»
Expected output:
 Out[]=
+22.4Compute all fractions i/j for i and j up to 500, and find the 10 fractions closest to pi. »
Expected output:
 Out[]=
+22.5Generate a list of 10 random numbers from 0 to 10, and make a graph of the 3 nearest neighbors of each one. »
Sample expected output:
 Out[]=
+22.6Find clusters of similar colors in a list of 100 random colors. »
Sample expected output:
 Out[]=
+22.7Make a feature space plot for both upper and lowercase letters of the alphabet. »
Expected output:
 Out[]=
How come Im getting different results from the ones shown here?
Its based on artificial neural networks inspired by the way brains seem to work. Its been trained with millions of example images, from which its progressively learned to make distinctions. And a bit like in the game of twenty questions, by using enough of these distinctions it can eventually determine what an image is of.
How many kinds of things can ImageIdentify recognize?
What makes ImageIdentify give a wrong answer?
A common cause is that what its asked about isnt close enough to anything its been trained on. This can happen if something is in an unusual configuration or environment (for example, if a boat is not on a bluish background). ImageIdentify usually tries to find some kind of match, and the mistakes it makes often seem very humanlike.
Can I ask ImageIdentify the probabilities it assigns to different identifications?
How many examples does Classify typically need to work well?
If the general area (like everyday images) is one it already knows well, then as few as a hundred. But in areas that are new, it can take many millions of examples to achieve good results.
How does Nearest figure out a distance between colors?
It uses the function ColorDistance, which is based on a model of human color vision.
How does Nearest determine nearby words?
By looking at those at the smallest EditDistance, that is, reached by the smallest number of single-letter insertions, deletions and substitutions.
What features does FeatureSpacePlot use?
Theres no easy answer. When its given a collection of things, itll learn features that distinguish themthough its typically primed by having seen many other things of the same general type (like images).