Dgraph DQL Tour

More Data

A bigger dataset

Ok, we are off to a start with Dgraph and DQL. Let’s move it up a few notches.

Dgraph can also do query aggregation, geo-queries, string querying and more. But for all that let’s move from the small datasets we started with and try out something bigger, much bigger.

In our github repository you’ll find a dataset about movies, directors and actors.

Download it from that link and save into the ~/dgraph directory or by running the following in a terminal

cd ~/dgraph
wget "https://github.com/dgraph-io/tutorial/blob/master/resources/1million.rdf.gz?raw=true" -O 1million.rdf.gz -q

OR (We have curl in Dgraph's containers)

curl -L -o 1million.rdf.gz "https://github.com/dgraph-io/tutorial/blob/master/resources/1million.rdf.gz?raw=true"

Run the schema mutation using the run button and then load the dataset into Dgraph from the terminal. You may be required to restart the Alpha with --cache_mb 4096 in order to handle bigger dataset.

docker exec -it dgraph dgraph live -f 1million.rdf.gz --alpha localhost:9080 --zero localhost:5080 -c 1

There’s around one million triples in the dataset. Dgraph reports back exactly how many triples were loaded and how long it took.

It’s a big database of movies, but it won’t trouble Dgraph. It is, however, big enough for us to use more complex queries.

This dataset is a one million triple subset of a larger dataset of movies that contains around 230000 actors and around 86000 movies. We’ve made a subset to make it quick to load, while having enough complexity to yield interesting results. But, because it’s a subset, you might find your favorite actor isn’t there, or that some actors are missing some of their movies.

Later in the tutorial, you might need to refer back to this page (or issue a schema query) to check the indexes or schema types. Remember that we can’t apply functions to edges that aren’t indexed, and we’ll learn that some functions require particular indexes.

4.1 A bigger dataset