Releasing Dgraph v0.7.1

Dgraph team is super excited to present v0.7.1 of Dgraph . This version is the biggest step we’ve taken towards our production aim of v1.0. We’ve implemented 90% of all the features we had planned in our product roadmap, including replication and high-availability using RAFT protocol, indexing, filtering, sorting, geospatial queries, and backups.

Today, I’m going to talk about these new features, and what you can expect from Dgraph .

Note: This is going to be a long blog post. If you just want to try out Dgraph , you can jump straight to our 5 step tutorial to get started with Dgraph.

Speed

Dgraph v0.7.1 is the fastest version of Dgraph we’ve ever released. Live data loading is now so fast that we realized we no longer needed a separate offline batch data loader. So, we removed all that complexity and just wrote a simple script to send mutation queries to live Dgraph via GRPC.

You can read more about loading and querying performance here.

Data Persistence, Sharding, High Availability, and Crash Resilience

Dgraph has been built from ground up to be run in Google scale production environments. To run terabytes of structured data over commodity hardware has been the main aim since the beginning. To achieve that, Dgraph needs to be able to deal with server failures; without losing any data or dropping any queries.

We’ve implemented RAFT, a distributed consensus algorithm within Dgraph , to handle such failures. Any writes that happen to go through a group of servers and are only acknowledged to the client once they’ve been applied to a majority of servers in the group. Thus, even if some of these servers crash, the data can be accessed, updated and queries can flow without the user getting affected.

Also, new servers can be introduced into an existing cluster by providing the IP address of any healthy server in the cluster. They will automatically pick up the other members, pull in data shards from healthy nodes, join the groups and become part of the cluster.

Such functionality has never been an aim for graph databases in the past. But we strongly believe that graph databases can be run as the primary databases; not just add-ons. To that effect, it’s important that they be able to survive machine crashes and avoid data loss.

This functionality makes Dgraph the most production ready graph database in the market.

You can read more about running multiple instances of Dgraph , sharding and distributing data in a cluster here. In fact, we created a small video to show you how Dgraph can recover from crashes without losing any data directly after a load.

Backups

Dgraph believes in standards. We heartily use the existing RDF NQuad standard for data input. Now, we’re making it easy to export data from Dgraph to other systems, by exporting our backups in RDF NQuad format. This makes it easy to feed Dgraph data into other systems, upgrade Dgraph versions or to switch over to another graph database. You can read more about backup here.

Indexing

Dgraph can now index various scalar values to allow sorting, term matching, and inequality filters. We support these value types: int, float, string, bool, date, datetime, geo, uid.

String values get tokenized using ICU, and allow allof and anyof functions. geo values allow for geo-indexing and support the geo functions mentioned below. uid value is a way to indicate that the predicate edge points to another entity node. This is useful to generate edges in reverse direction automatically. The rest support sorting, equality and inequality functions.

If you want to generate the index for a particular predicate, mention its type and specify the @index keyword. Similarly, to generate reverses for a uid predicate, you can specify the @reverse keyword.

Here’s an example schema file for freebase film data:

scalar (
  film.director.film             : uid    @ reverse
  film.film.genre                : uid    @ reverse
  film.film.initial_release_date : date   @ index
  film.film.rating               : uid    @ reverse
  loc                            : geo    @ index
  type.object.name.en            : string @ index
  type.object.name.hi            : string @ index
  type.object.name.ta            : string @ index
)

You can read more about Dgraph schema here.

Reverse Edges

Each graph edge is unidirectional. But a lot of times, you need to have both the forward and backward edges. For those cases, Dgraph now provides a @reverse keyword, which can be applied to predicates of uid type. This would trigger automatic generation of the reverse edges, to allow querying in the reverse direction.

You can read more about reverse edges here.

Sorting

You can now sort the results by any indexed predicate (except string and geo, of course). For example, you can now get a list of films directed by Steven Spielberg sorted by initial release date.

{
  me(_xid_: m.06pj8) {
    type.object.name.en
    film.director.film(order: film.film.initial_release_date) {
      type.object.name.en
      film.film.initial_release_date
    }
  }
}

# To sort in descending order, just use orderdesc instead of order.

You can read more about sorting here.

Functions

Functions are a great way to provide functionality with a simple and clear interface. Dgraph functions can be used as both starting points to queries and as filters.

Here’s a list of functions we introduced in v0.7.1:

  • anyof(predicate, "space separated list of terms") : Entities whose value for the predicate has any of the terms specified.
  • allof(predicate, "space separated list of terms") : Entities whose value for the predicate has all of the terms specified.
  • leq(predicate, "value") : Entities whose value for the predicate is less than or equal to specified value.
  • geq(predicate, "value") : Entities whose value for the predicate is greater than or equal to specified value.

Both anyof and allof functions work based on an index generated by tokenizing string values. We use ICU, which has vast support for human languages and is used by major projects like Google Web Search, Chrome, Mac OSX, Lucene, etc.

Upcoming in v0.7.2:

  • eq(predicate, "value"): Entities whose value for predicate is equal to specified value.
  • le(predicate, "value"): Entities whose value for predicate is less than specified value.
  • ge(predicate, "value"): Entities whose value for predicate is greater than specified value.

To enable these functions, a schema providing the scalar type for the predicates should be provided. For e.g.

scalar(
  name          : string @ index # anyof, allof
  age           : int    @ index # leq, geq
  date_of_birth : date   @ index # leq, geq
)

You can read more about functions here.

Geospatial functions

Geospatial queries are important to build location-aware search and recommendations. Being able to find the nearby restaurants who serve sushi, or bars in the city which play Jazz, or friends who are visiting your neighborhood is pretty crucial to such use cases. We couldn’t imagine building a database without providing full-fledged support for geo queries, and in v0.7.1, we added geo indexing.

Dgraph now supports four geospatial functions:

  • near(predicate, geo-location) : Finds all entities lying within a specified distance from a point.
  • within(predicate, geo-polygon) : Finds all entities lying within a specified region.
  • contains(predicate, geo-location) : Finds all enclosures for a specified point or region.
  • intersects(predicate, geo-polygon) : Finds all entities which intersect a specified region.
$ curl localhost:8080/query -XPOST -d $'
{
  tourist( near("loc", "{\'type\':\'Point\', \'coordinates\': [-122.469829, 37.771935]}", "1000" ) ) {
    name
  }
}' | python -m json.tool | grep name

            "name": "Steinhart Aquarium"
            "name": "Spreckels Temple of Music"
            "name": "Pioneer Log Cabin"
            "name": "Conservatory of Flowers"
            "name": "De Young Museum"
            "name": "Chinese Pavillion"
            "name": "Japanese Tea Garden"
            "name": "Peace Lantern"
            "name": "San Francisco Botanical Garden"
            "name": "Morrison Planetarium"
            "name": "California Academy of Sciences"
            "name": "Hamon Tower"
            "name": "National AIDS Memorial Grove"
            "name": "La Rose des Vents"
            "name": "Strawberry Hill"
            "name": "Buddha"
            "name": "Rose Garden"

You can read more about Geolocation functionality here.

Filtering

Dgraph now supports pretty advanced and complex filtering operations. We do both && (and) and || (or) filters, using round brackets to specify the right sequence. For e.g., (A || (B && C)), as demonstrated in the following query.

{
  me(_xid_: m.06pj8) {
    type.object.name.en
    film.director.film @filter(
      allof("type.object.name.en", "jones indiana") ||
      (anyof("type.object.name.en", "jurassic") && anyof("type.object.name.en", "park")))  {
      type.object.name.en
      film.film.initial_release_date
    }
  }
}

With that, I’ll finish the blog post. If you found these interesting, try out the 5 step tutorial to get started with Dgraph. Let us know what you think!

I’ll be giving a talk about Dgraph at Go Meetup in Sydney on 19th Jan and in Gophercon India on 24-25th Feb. So, come talk to me about Dgraph .