45 Datasets
Create a simple dataset that can be viewed as having 2 rows and 3 columns:
 In[1]:=
 Out[1]=
 In[2]:=
 Out[2]=
You can first extract the whole b row, then get the z element of the result:
 In[3]:=
 Out[3]=
You can also just get the whole b row of the dataset. The result is a new dataset, which for ease of reading happens to be displayed in this case as a column.
Generate a new dataset from the b row of the original dataset:
 In[4]:=
 Out[4]=
Here is the dataset that corresponds to the z column for all rows.
Generate a dataset consisting of the z column for all rows:
 In[5]:=
 Out[5]=
Get totals for each row by applying Total to all columns for all the rows:
 In[6]:=
 Out[6]=
If we use f instead of Total, we can see whats going on: the function is being applied to each of the row associations.
Apply the function f to each row:
 In[7]:=
 Out[7]=
Apply a function that adds the x and z elements of each association:
 In[8]:=
 Out[8]=
 In[9]:=
 Out[9]=
You can give a function to apply to all rows too.
This extracts the value of each z column, then applies f to the association of results:
 In[10]:=
 Out[10]=
Apply f to the totals of all columns:
 In[11]:=
 Out[11]=
 In[12]:=
 Out[12]=
Find totals for all rows, then pick out the total for the b row:
 In[13]:=
 Out[13]=
Its equivalent to this:
 In[14]:=
 Out[14]=
 In[15]:=
 Out[15]=
 In[16]:=
 Out[16]=
The operator form of Select is a function which can be applied to actually perform the Select operation.
Make a dataset by selecting only rows whose z column is greater than 5:
 In[17]:=
 Out[17]=
For each row, select columns whose values are greater than 5, leaving a ragged structure:
 In[18]:=
 Out[18]=
Normal turns the dataset into an ordinary association of associations:
 In[19]:=
 Out[19]=
Many Wolfram Language functions have operator forms.
 In[20]:=
 Out[20]=
SortBy has an operator form:
 In[21]:=
 Out[21]=
Sort rows according to the value of the difference of the x and y columns:
 In[22]:=
 Out[22]=
Sort the rows, and find the total of all columns:
 In[23]:=
 Out[23]=
Sometimes you want to apply a function to each element in the dataset.
Apply f to each element in the dataset:
 In[24]:=
 Out[24]=
Sort the rows before totaling the squares of their elements:
 In[25]:=
 Out[25]=
A dataset formed from a list of associations:
 In[26]:=
 Out[26]=
 In[27]:=
 Out[27]=
 In[28]:=
 Out[28]=
 In[29]:=
 Out[29]=
 In[30]:=
 Out[30]=
If we ask about the moons of Mars, we get a dataset, which we can then query further.
Get a dataset about the moons of Mars:
 In[31]:=
 Out[31]=
Drill down to make a table of radii of all the moons of Mars:
 In[32]:=
 Out[32]=
Make a dataset of the number of moons listed for each planet:
 In[33]:=
 Out[33]=
Find the total mass of all moons for each planet:
 In[34]:=
 Out[34]=
Get the same result, but only for planets with more than 10 moons:
 In[35]:=
 Out[35]=
 In[36]:=
 Out[36]=
Get a dataset with moons that are more than 1% of the mass of the Earth.
For all moons, select ones whose mass is greater than 0.01 times the mass of the Earth:
 In[37]:=
 Out[37]=
Get the list of keys (i.e. moon names) in the resulting association for each planet:
 In[38]:=
 Out[38]=
 In[39]:=
 Out[39]=
 In[40]:=
 Out[40]=
Heres the whole computation in one line:
 In[41]:=
 Out[41]=
Make number line plots of the logarithms of masses for moons of each planet:
 In[42]:=
 Out[42]=
Heres how to make a word cloud of names of moons, sized according to the masses of the moons. To do this, we need a single association that associates the name of each moon with its mass.
When given an association, WordCloud determines sizes from values in the association:
 In[43]:=
 Out[43]=
The function Association combines associations:
 In[44]:=
 Out[44]=
 In[45]:=
 Out[45]=
Weve seen before that we can write something like f[g[x]] as f@g@x or x//g//f. We can also write it f[g[#]]&[x]. But what about f[g[#]]&? Is there a short way to write this? The answer is that there is, in terms of the function composition operators @* and /*.
f@*g@*h represents a composition of functions to be applied right-to-left:
 In[46]:=
 Out[46]=
h/*g/*f represents a composition of functions to be applied left-to-right:
 In[47]:=
 Out[47]=
Heres the previous code rewritten using composition @*:
 In[48]:=
 Out[48]=
 In[49]:=
 Out[49]=
As a final example, lets look at another datasetthis time coming straight from the Wolfram Data Repository. Heres a webpage (about big meteors) from the repository:
To get the main dataset thats mentioned here, just use ResourceData.
Get the dataset just by giving its name to ResourceData:
 In[50]:=
 Out[50]=
Extract the coordinates entry from each row, and plot the results:
 In[51]:=
 Out[51]=
Make a histogram of the altitudes:
 In[52]:=
 Out[52]=
 Dataset[data] a dataset Normal[dataset] convert a dataset to normal lists and associations Catenate[{assoc1, ...}] catenate associations, combining their elements f@*g composition of functions (f[g[x]] when applied to x) f/*g right composition (g[f[x]] when applied to x)
Note: These exercises use the dataset planets=CloudGet["http://wolfr.am/7FxLgPm5"].
45.1Make a word cloud of the planets, with weights determined by their number of moons. »
Sample expected output:
 Out[]=
45.2Make a bar chart of the number of moons for each planet. »
Sample expected output:
 Out[]=
45.3Make a dataset of the masses of the planets, sorted by their number of moons. »
Sample expected output:
 Out[]=
45.4Make a dataset of planets and the mass of each ones most massive moon. »
Sample expected output:
 Out[]=
45.5Make a dataset of masses of planets, where the planets are sorted by the largest mass of their moons. »
Sample expected output:
 Out[]=
45.6Make a dataset of the median mass of all moons for each planet. »
Sample expected output:
 Out[]=
Sample expected output:
 Out[]=
45.8Make a word cloud of countries in Central America, with the names of countries proportional to the lengths of the Wikipedia article about them. »
Sample expected output:
 Out[]=
45.9Find the maximum observed altitude in the Fireballs & Bolides dataset. »
Expected output:
 Out[]=
45.10Find a dataset of the 5 largest observed altitudes in the Fireballs & Bolides dataset. »
Expected output:
 Out[]=
45.11Make a histogram of the differences in successive peak brightness times in the Fireballs & Bolides dataset. »
Expected output:
 Out[]=
45.12Plot the nearest cities for the first 10 entries in the Fireballs & Bolides dataset, labeling each city. »
Expected output:
 Out[]=
45.13Plot the nearest cities for the 10 entries with largest altitudes in the Fireballs & Bolides dataset, labeling each city. »
Expected output:
 Out[]=
What kinds of data can datasets contain?
Any kinds. Not just numbers and text but also images, graphs and lots more. Theres no need for all elements of a particular row or column to be the same type.
Yes. SemanticImport is often a good way to do it.
What are databases and how do they relate to Dataset?
Databases are a traditional way to store structured data in a computer system. Databases are often set up to allow both reading and writing of data. Dataset is a way to represent data that might be stored in a database so that its easy to manipulate with the Wolfram Language.
How does data in Dataset compare to data in an SQL (relational) database?
SQL databases are strictly based on tables of data arranged in rows and columns of particular types, with additional data linked in through foreign keys. Dataset can have any mixture of types of data, with any number of levels of nesting, and any hierarchical structure, somewhat more analogous to a NoSQL database, but with additional operations made possible by the symbolic nature of the language.