Data Science and Report Generation

The Wolfram Language has what you need to process data and publish professional reports.

Data Acquisition

Importing Data from Files

In order to do data science, you need data, and the Wolfram Language comes with many ways to easily access the data you need. The built-in Import function will import several hundred kinds of commonly used file formats.

1. Import data using the default settings. Import will automatically import most common file formats as a suitable expression:
Import["ExampleData/cities.xlsx"]
If Import cannot determine the format of a file, you can specify it explicity:
Import["ExampleData/cities.xlsx", "XLSX"]

It's also easy to import data into a Dataset object, a structured dataset based on a hierarchy of lists and associations. This makes it easy (and fast) to traverse large datasets.

2. Import data as a Dataset.

Data-oriented formats such as CSV, TSV, XLS and XLSX will import as a Dataset. Specify "Dataset" as the second argument to Import.

Import will automatically import most common file formats as a suitable expression:
Import["ExampleData/cities.xlsx", "Dataset"]

Often, you'll want to extract a particular element from a dataset without having to import the whole dataset and then extract it. Using an additional parameter, the Import function can extract particular elements.

3. Import particular elements from a data file or webpage. Many files and webpages contain elements other than the data returned by default by Import. Get a list of elements by giving "Elements" as the second argument to Import. Import will automatically import most common file formats as a suitable expression:
Import["ExampleData/cities.xlsx", "Elements"]
Specify which element to import:
Import["ExampleData/cities.xlsx", "Images"]

Importing Data from an API

The Wolfram Language makes it easy to connect to external services. In this example, data is accessed via an API about the location of bikeshares in London:

APIURL="http://api.citybik.es/barclays-cycle-hire.json";
Map[{#["lat"]/1000000,#["lng"]/1000000}&,Import[APIURL,"RawJSON"]]//GeoHistogram

Analysis and Visualization

Automated Analysis

The Wolfram Language has thousands of built-in functions that allow you to focus on your project, not the technicalities of how specific actions are formed. While you can fully specify every detail, the default settings for the functions are designed to work the best in almost all cases, resulting in short, readable code, even for very complex tasks. In this example, bivariate data is automatically clustered with the FindClusters function.

Find and visualize clusters in bivariate data:
data = {{-1.1, 2.6}, {3.9, -0.8}, {4.2, -3.7}, {3.3, 3.5}, {3.9,.2}, {4.1, -4.8}, {3.8, 3.7}, {5.6, 0.1}, {3.1, -5.2}, {-0.9, 2.3}, {2.9, 4.1}, {-2.3, 3.9}, {-2.5, 3.}, {2.6, -5.5}, {5.2, 1.9}, {-0.7, 1.3}, {0.9, 2.8}, {-1.5, 3.3}, {3.8, 1.2}, {2.6, -5.1}, {-0.8, 3.2}, {4.7, 0.7}, {3., 3.}, {3.9, 3.6}, {4.5, 1.4}, {4.2, 1.3}, {-1.1, 2.6}, {4.8, 2.4}, {3.3, -3.5}, {3.2, -4.6}, {3.3, -4.9}, {3., 3.5}, {0.7, 2.1}, {3.2, -4.3}, {-2., 0.5}, {-1.2, 2.}, {-1.6, 1.8}, {-3.5, 3.7}, {4.8, 0.2}, {3.3, 2.4}, {-0.1, 2.1}, {-1.3, 2.5}, {4.4, 3.9}, {3.5, 0.2}, {0.1, 2.9}, {-1., 1.6}, {-1.4, 4.5}, {3.2, 2.5}, {-1.6, 2.4}, {2.6, -5.1}};
ListPlot[FindClusters[data]]

High-level functions like FindDistribution can analyze your data and figure out which of over 35 distributions best fits your data using a variety of statistical methods.

Generate data sampled from an exponential distribution:
\[ScriptCapitalD] = ExponentialDistribution[1]; data = RandomVariate[\[ScriptCapitalD], 1000];
Find the best distribution from the data:
estimated\[ScriptCapitalD] = FindDistribution[data]
Compare the PDFs for the original and estimated distributions:
Plot[{PDF[\[ScriptCapitalD], x], PDF[estimated\[ScriptCapitalD], x]}, {x, 0, 10}, PlotLegends -> {"\[ScriptCapitalD]", "e\[ScriptCapitalD]"}]

Cloud Deployment

You will often want to share a program with others, and the Wolfram Language makes it easy to turn your code into a standalone, interactive webpage. Using the CloudDeploy function, your code will be published to Wolfram Research servers and be made accessible to either everyone or to whomever you grant permission. In this example, an interactive program for recognizing an image of a molecule is turned into a public webpage.

1. Make the content to be published:
LabelMoleculeDrawing[image_]:=MoleculePlot[MoleculeRecognize[image],<|"Carbonyl"->Bond[{"C","O"},"Double"],"Ring carbons"->Atom["C","RingAtomQ"->True]|>]
2. Use the CloudDeploy function to publish to the cloud:
CloudDeploy[FormFunction[{"molecule" -> "Image"}, LabelMoleculeDrawing[#image] &,"JPEG", AppearanceRules -> {"Title" -> "Molecule recognizer"}], Permissions -> "Public"]

Get Started

Learning Resources

Learning Paths

Try it now, learn later

Want to just try it out? Get a feel for what the Wolfram Language is like while trying out real code samples focused on building and deploying web applications.

Try these instantly! Access with a free Wolfram Cloud account
Get certified for free in the Wolfram Language

We've made it easy to learn the Wolfram Language your way. Try our free interactive course and earn a certification.

Start the interactive online course now! About 20 hours for completion
Try these instantly! Access with a free Wolfram Cloud account
Start the interactive online course now! About 20 hours for completion

Go Further with Data Science

Want to keep exploring data science?

If you want to see more of what Wolfram offers for data science, read about the Wolfram approach to data science & AI. You'll find:

  • Downloadable examples
  • Links to documentation
  • Talks, presentations and lectures
  • Online classes
  • Techical information
Go further with data science

Recommended Product