Work with a Large Database

This example looks at a terabyte-scale database and performs some basic queries that would be impossible to perform in-memory.

Open Street Map is a collaborative effort to generate a free map of the world. The project was created in 2004 and its over two million users have generated over a terabyte of data. As such, it is a great example database for showcasing out-of-core data science. Instructions on how to get the data and set up a database server can be found here.

Register the database for usage with entities.

This is a very large database; its largest table "planet_osm_nodes" takes up almost 200 GB on disk. Here is how many rows it contains.

Suppose you wanted to find all the streets that contain "Wolf".

Unfortunately these contain quite a few duplicates, but you can check for the number of distinct names.

Another interesting thing to look at is the "planet_osm_table", which contains lots of metadata about various objects. For example, you can check how many trees were mapped.

Or what the most common sport structures are.

Visualize the result.

Related Examples

ja zh