Introduction to the Wolfram Language
All sections
22Machine Learning
Well talk about how to train the language yourself. But first lets look at some built-in functions that have already been trained on huge numbers of examples.
LanguageIdentify takes pieces of text, and identifies what human language theyre in.
Click for copyable input
Click for copyable input
Theres a general function Classify, which has been taught various kinds of classification. One example is classifying the sentiment of text.
Click for copyable input
Downbeat text is classified as having negative sentiment:
Click for copyable input
You can also train Classify yourself. Heres a simple example of classifying handwritten digits as 0 or 1. You give Classify a collection of training examples, followed by a particular handwritten digit. Then itll tell you whether the digit you give is a 0 or 1.
With training examples, Classify correctly identifies a handwritten 0:
Click for copyable input
Find what number in the list is nearest to 22:
Click for copyable input
Find the nearest three numbers:
Click for copyable input
Nearest can find nearest colors as well.
Find the 3 colors in the list that are nearest to the color you give:
Click for copyable input
Click for copyable input
Click for copyable input
TextRecognize can still recognize the original text string in this.
Recognize text in the image:
Click for copyable input
If the text gets too blurred TextRecognize cant tell what it saysand you probably cant either.
Generate a sequence of progressively more blurred pieces of text:
Click for copyable input
As the text gets more blurred, TextRecognize makes a mistake, then gives up altogether:
Click for copyable input
Something similar happens if we progressively blur the picture of a cheetah. When the picture is still fairly sharp, ImageIdentify will correctly identify it as a cheetah. But when it gets too blurred ImageIdentify starts thinking its more likely to be a lion, and eventually the best guess is that its a picture of a person.
Progressively blur a picture of a cheetah:
Click for copyable input
When the picture gets too blurred, ImageIdentify no longer thinks its a cheetah:
Click for copyable input
ImageIdentify normally just gives what it thinks is the most likely identification. You can tell it, though, to give a list of possible identifications, starting from the most likely. Here are the top 10 possible identifications, in all categories.
ImageIdentify thinks this might be a cheetah, but its more likely to be a lion, or it could be a dog:
Click for copyable input
When the image is sufficiently blurred, ImageIdentify can have wild ideas about what it might be:
Click for copyable input
In machine learning, one often gives training that explicitly says, for example, this is a cheetah, this is a lion. But one also often just wants to automatically pick out categories of things without any specific training.
Collect clusters of similar colors into separate lists:
Click for copyable input
Click for copyable input
Show nearby colors successively grouped together:
Click for copyable input
In the Wolfram Language, FeatureSpacePlot takes collections of objects and tries to find what it considers the best distinguishing features of them, then uses the values of these to position objects in a plot.
FeatureSpacePlot doesnt explicitly say what features its usingand actually theyre usually quite hard to describe. But what happens in the end is that FeatureSpacePlot arranges things so that objects that have similar features are drawn nearby.
FeatureSpacePlot makes similar colors be placed nearby:
Click for copyable input
If one uses, say, 100 colors picked completely at random, then FeatureSpacePlot will again place colors it considers similar nearby.
100 random colors laid out by FeatureSpacePlot:
Click for copyable input
Lets try the same kind of thing with images of letters.
Make a rasterized image of each letter in the alphabet:
Click for copyable input
FeatureSpacePlot will use visual features of these images to lay them out. The result is that letters that look similarlike y and v or e and cwill wind up nearby.
Click for copyable input
FeatureSpacePlot places photographs of different kinds of things quite far apart:
Click for copyable input
LanguageIdentify[text] identify what human language text is in
ImageIdentify[image] identify what an image is of
TextRecognize[text] recognize text from an image (OCR)
Classify[training,data] classify data on the basis of training examples
Nearest[list,item] find what element of list is nearest to item
FindClusters[list] find clusters of similar items
NearestNeighborGraph[list,n] connect elements of list to their n nearest neighbors
Dendrogram[list] make a hierarchical tree of relations between items
FeatureSpacePlot[list] plot elements of list in an inferred feature space
16 Exercises Available
with 7 Extras
Get Started »
How come Im getting different results from the ones shown here?
Its based on artificial neural networks inspired by the way brains seem to work. Its been trained with millions of example images, from which its progressively learned to make distinctions. And a bit like in the game of twenty questions, by using enough of these distinctions it can eventually determine what an image is of.
How many kinds of things can ImageIdentify recognize?
What makes ImageIdentify give a wrong answer?
A common cause is that what its asked about isnt close enough to anything its been trained on. This can happen if something is in an unusual configuration or environment (for example, if a boat is not on a bluish background). ImageIdentify usually tries to find some kind of match, and the mistakes it makes often seem very humanlike.
Can I ask ImageIdentify the probabilities it assigns to different identifications?
How many examples does Classify typically need to work well?
If the general area (like everyday images) is one it already knows well, then as few as a hundred. But in areas that are new, it can take many millions of examples to achieve good results.
How does Nearest figure out a distance between colors?
It uses the function ColorDistance, which is based on a model of human color vision.
How does Nearest determine nearby words?
By looking at those at the smallest EditDistance, that is, reached by the smallest number of single-letter insertions, deletions and substitutions.
What features does FeatureSpacePlot use?
Theres no easy answer. When its given a collection of things, itll learn features that distinguish themthough its typically primed by having seen many other things of the same general type (like images).
Tech Notes