Dgraph team is super excited to present v0.7.1 of Dgraph . This version is the biggest step we’ve taken towards our production aim of v1.0. We’ve implemented 90% of all the features we had planned in our product roadmap, including replication and high-availability using RAFT protocol, indexing, filtering, sorting, geospatial queries, and backups.
Today, I’m going to talk about these new features, and what you can expect from Dgraph .
Note: This is going to be a long blog post. If you just want to try out Dgraph , you can jump straight to our 5 step tutorial to get started with Dgraph.
Dgraph v0.7.1 is the fastest version of Dgraph we’ve ever released. Live data loading is now so fast that we realized we no longer needed a separate offline batch data loader. So, we removed all that complexity and just wrote a simple script to send mutation queries to live Dgraph via GRPC.
You can read more about loading and querying performance here.
Dgraph has been built from ground up to be run in Google scale production environments. To run terabytes of structured data over commodity hardware has been the main aim since the beginning. To achieve that, Dgraph needs to be able to deal with server failures; without losing any data or dropping any queries.
We’ve implemented RAFT, a distributed consensus algorithm within Dgraph , to handle such failures. Any writes that happen to go through a group of servers and are only acknowledged to the client once they’ve been applied to a majority of servers in the group. Thus, even if some of these servers crash, the data can be accessed, updated and queries can flow without the user getting affected.
Also, new servers can be introduced into an existing cluster by providing the IP address of any healthy server in the cluster. They will automatically pick up the other members, pull in data shards from healthy nodes, join the groups and become part of the cluster.
Such functionality has never been an aim for graph databases in the past. But we strongly believe that graph databases can be run as the primary databases; not just add-ons. To that effect, it’s important that they be able to survive machine crashes and avoid data loss.
This functionality makes Dgraph the most production ready graph database in the market.
You can read more about running multiple instances of Dgraph , sharding and distributing data in a cluster here. In fact, we created a small video to show you how Dgraph can recover from crashes without losing any data directly after a load.
Dgraph believes in standards. We heartily use the existing RDF NQuad standard for data input. Now, we’re making it easy to export data from Dgraph to other systems, by exporting our backups in RDF NQuad format. This makes it easy to feed Dgraph data into other systems, upgrade Dgraph versions or to switch over to another graph database. You can read more about backup here.
Dgraph
can now index various scalar values to allow sorting, term matching, and inequality filters. We support these value types: int
, float
, string
, bool
, date
, datetime
, geo
, uid
.
String values get tokenized using ICU, and allow allof
and anyof
functions.
geo
values allow for geo-indexing and support the geo functions mentioned below. uid
value is a way to indicate that the predicate edge points to another entity node.
This is useful to generate edges in reverse direction automatically. The rest support sorting, equality and inequality functions.
If you want to generate the index for a particular predicate, mention its type and specify the @index
keyword.
Similarly, to generate reverses for a uid predicate, you can specify the @reverse
keyword.
Here’s an example schema file for freebase film data:
scalar (
film.director.film : uid @ reverse
film.film.genre : uid @ reverse
film.film.initial_release_date : date @ index
film.film.rating : uid @ reverse
loc : geo @ index
type.object.name.en : string @ index
type.object.name.hi : string @ index
type.object.name.ta : string @ index
)
You can read more about Dgraph schema here.
Each graph edge is unidirectional. But a lot of times, you need to have both the forward and backward edges. For those cases, Dgraph
now provides a @reverse
keyword, which can be applied to predicates of uid
type. This would trigger automatic generation of the reverse edges, to allow querying in the reverse direction.
You can read more about reverse edges here.
You can now sort the results by any indexed predicate (except string and geo, of course). For example, you can now get a list of films directed by Steven Spielberg sorted by initial release date.
{
me(_xid_: m.06pj8) {
type.object.name.en
film.director.film(order: film.film.initial_release_date) {
type.object.name.en
film.film.initial_release_date
}
}
}
# To sort in descending order, just use orderdesc instead of order.
You can read more about sorting here.
Functions are a great way to provide functionality with a simple and clear interface. Dgraph functions can be used as both starting points to queries and as filters.
Here’s a list of functions we introduced in v0.7.1:
anyof(predicate, "space separated list of terms")
: Entities whose value for the predicate has any of the terms specified.allof(predicate, "space separated list of terms")
: Entities whose value for the predicate has all of the terms specified.leq(predicate, "value")
: Entities whose value for the predicate is less than or equal to specified value.geq(predicate, "value")
: Entities whose value for the predicate is greater than or equal to specified value.Both anyof
and allof
functions work based on an index generated by tokenizing string values. We use ICU, which has vast support for human languages and is used by major projects like Google Web Search, Chrome, Mac OSX, Lucene, etc.
Upcoming in v0.7.2:
eq(predicate, "value")
: Entities whose value for predicate is equal to specified value.le(predicate, "value")
: Entities whose value for predicate is less than specified value.ge(predicate, "value")
: Entities whose value for predicate is greater than specified value.To enable these functions, a schema providing the scalar type for the predicates should be provided. For e.g.
scalar(
name : string @ index # anyof, allof
age : int @ index # leq, geq
date_of_birth : date @ index # leq, geq
)
You can read more about functions here.
Geospatial queries are important to build location-aware search and recommendations. Being able to find the nearby restaurants who serve sushi, or bars in the city which play Jazz, or friends who are visiting your neighborhood is pretty crucial to such use cases. We couldn’t imagine building a database without providing full-fledged support for geo queries, and in v0.7.1, we added geo indexing.
Dgraph now supports four geospatial functions:
near(predicate, geo-location)
: Finds all entities lying within a specified distance from a point.within(predicate, geo-polygon)
: Finds all entities lying within a specified region.contains(predicate, geo-location)
: Finds all enclosures for a specified point or region.intersects(predicate, geo-polygon)
: Finds all entities which intersect a specified region.$ curl localhost:8080/query -XPOST -d $'
{
tourist( near("loc", "{\'type\':\'Point\', \'coordinates\': [-122.469829, 37.771935]}", "1000" ) ) {
name
}
}' | python -m json.tool | grep name
"name": "Steinhart Aquarium"
"name": "Spreckels Temple of Music"
"name": "Pioneer Log Cabin"
"name": "Conservatory of Flowers"
"name": "De Young Museum"
"name": "Chinese Pavillion"
"name": "Japanese Tea Garden"
"name": "Peace Lantern"
"name": "San Francisco Botanical Garden"
"name": "Morrison Planetarium"
"name": "California Academy of Sciences"
"name": "Hamon Tower"
"name": "National AIDS Memorial Grove"
"name": "La Rose des Vents"
"name": "Strawberry Hill"
"name": "Buddha"
"name": "Rose Garden"
You can read more about Geolocation functionality here.
Dgraph
now supports pretty advanced and complex filtering operations. We do both && (and) and || (or) filters, using round brackets to specify the right sequence. For e.g., (A || (B && C))
, as demonstrated in the following query.
{
me(_xid_: m.06pj8) {
type.object.name.en
film.director.film @filter(
allof("type.object.name.en", "jones indiana") ||
(anyof("type.object.name.en", "jurassic") && anyof("type.object.name.en", "park"))) {
type.object.name.en
film.film.initial_release_date
}
}
}
With that, I’ll finish the blog post. If you found these interesting, try out the 5 step tutorial to get started with Dgraph. Let us know what you think!
I’ll be giving a talk about Dgraph at Go Meetup in Sydney on 19th Jan and in Gophercon India on 24-25th Feb. So, come talk to me about Dgraph .