An introduction to Dgraph core concepts through hands-on.
The hands-on examples are a way to better understand each concept by experiencing directly with Dgraph. They are not a substitute for product documentation.
You can perform all the steps of this post using a local Learning Environment with a Dgraph instance, and Ratel UI running in docker containers.
You can start in seconds by provisioning a Dgraph Cloud instance
In the Dgraph Cloud console, click Launch new backend
.
Dedicated
instance and the region that meets your requirements.For this blog, we will work without schema (more on this in the next episode) so we need to set the schema mode to flexible
. This setting is only available to Dedicated
clusters.
When your cluster is created, in the Dgraph cloud console, click Settings
and set Schema mode to flexible
.
Ratel is a graphical data visualization tool. On your cloud instance access “Ratel” in the left Menu.
We will use curl
client and the JSON processor jq to properly display query results.
Dgraph is all about interconnected data. The W3C uses the term “Semantic Web” to refer to the Web of linked data and RDF is one the Semantic Web Standards for data interchange. The RDF 1.1 introduced a very simple yet powerful representation allowing structured and semi-structured data to be mixed, exposed, and shared across different applications. Its main focus is to name the relationships between things as well as the two ends of the link (this is usually referred to as a “triple”). Dgraph is using a simplified version of the standard. It is so simple that it takes the form of a line of text with 4 elements and a final dot, all separated by a space. It is important to understand how powerful and transformative this simple approach is as it is one of the underlying principles of Dgraph.
Let’s look at an example.
<_:jedi1> <character_name> "Luke Skywalker" .
<_:leia> <character_name> "Leia" .
<_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
<_:sith1> <has_for_child> <_:jedi1> .
<_:sith1> <has_for_child> <_:leia> .
The 4 elements of the notation are
Those lines could be read as
character_name
“Luke Skywalker”,character_name
“Leia”,character_name
“Anakin”. The character_name
of ‘sith1’ has a characteristic ‘aka’ equal to “Darth Vador” and “villain” equal true.has_for_child
with the thing referred to as ‘jedi1’.has_for_child
with the thing referred to as ‘leia’.Comments
We can see those simple lines as a list of facts. They represent certain information and knowledge (at one point in time it was even a revelation).
We will save those facts directly in Dgraph.
As you can store facts aka knowledge
in Dgraph as a graph, the term “knowledge graph” is sometimes used.
We have used the term thing
for the subject because nothing is enforcing a specific semantic for the subject. As a generic term, we prefer node or entity rather than thing.
The notation _:jedi1
. It is called a blank node
in the RDF specification. It is a temporary identifier of the node. It means that we don’t have a better way to reference the node we are talking about, but as we need to reference the same node in the next lines, as subject or object, we just refer to it as <_:jedi1> in this group of lines.
The object part may be an entity <:sith1> <has_for_child> <:jedi1>. In that case it’s natural to see the predicate as a relationship.
The object part may be a literal value. <_:jedi1>
Let’s play with Dgraph
In Dgraph Query Language (DQL), operations modifying the data (add, delete or update) are called mutations
; operations reading the data are simply called queries
.
You may use one of the DQL client available to execute mutations and queries. We will illustrate this post using the raw HTTP client and Ratel UI.
Add data using a HTTP client
curl "localhost:8080/mutate?commitNow=true" \
-s -H "Content-Type: application/rdf" -X POST -d $'
{
set {
<_:jedi1> <character_name> "Luke Skywalker" .
<_:leia> <character_name> "Leia" .
<_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
<_:sith1> <has_for_child> <_:jedi1> .
<_:sith1> <has_for_child> <_:leia> .
}
}
' | jq
Add data using a mutation in Ratel
Alternatively, you can use Ratel Console: select Mutate
tab
and enter
{
set {
<_:jedi1> <character_name> "Luke Skywalker" .
<_:leia> <character_name> "Leia" .
<_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
<_:sith1> <has_for_child> <_:jedi1> .
<_:sith1> <has_for_child> <_:leia> .
}
}
and hit RUN
.
Check the JSON response (in Ratel, select the response
tab)
{
"data": {
"code": "Success",
"message": "Done",
"queries": null,
"uids": {
"jedi1": "0x1",
"leia": "0x2",
"sith1": "0x3"
}
},
...
Dgraph has successfully saved the facts and it also tells us that it has given unique identifiers for the blank nodes that we have provided. We can use those identifiers to add or change facts about the entities.
Just copy the jedy1 identifier ( 0x01
in this our example)
And run another mutation.
curl "localhost:8080/mutate?commitNow=true" \
-s -H "Content-Type: application/rdf" -X POST -d $'
{
set {
<0x1> <eye_color> "blue" .
}
}
' | jq
In Ratel, just copy/paste the following mutation in the mutation tab amd hit RUN.
{
set {
<0x1> <eye_color> "blue".
}
}
It’s time to retrieve information from Dgraph using a query.
Query in Ratel
Select Query and copy-paste the request and hit RUN
:
{
characters(func:has(character_name)) {
character_name @facets
eye_color
has_for_child { character_name }
}
}
Select the Graph
tab to display the result … Et voilà.
Your first graph shows 3 entities and two relations.
If needed, move the nodes in the visualization to better see the relation name.
Select Luke to display the panel with the attributes for this node.
As you are curious, click on the JSON tab, it displays a JSON format of the query response :
{
"data": {
"characters": [
{
"character_name": "Luke Skywalker",
"eye_color": "blue"
},
{
"character_name": "Leia"
},
{
"character_name|aka": "Darth Vador",
"character_name|vilain": true,
"character_name": "Anakin",
"has_for_child": [
{
"character_name": "Luke Skywalker"
},
{
"character_name": "Leia"
}
]
}
]
}
,...
Query using HTTP Client
curl "localhost:8080/query" -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{
characters(func:has(character_name)) {
character_name @facets
eye_color
has_for_child { character_name }
}
}' | jq
The command outputs the result as a JSON structure.
We will dig into that later but the most remarkable point here is that the response has exactly the structure of the query. It makes it a very powerful tool for client applications as they always know the structure of the response even with dynamically created queries. This capability is referred to as being “declarative” : we declare what we are interested in.
This query can be understood as
character_name
.character_name
of the found entities with all the attached characteristics (facets).eye_color
so give me that info too.has_for_child
predicate. If it exists it links to another entity and I want to know the character_name of that entity.Questions
What happened to my identifier _:jedi1
?
<_:jedi1>
was a temporary identifier. It is valid in the context of the transaction: all RDF lines in the same transaction referencing <_:jedi1> are referencing the same entity. Dgraph has generated a unique id for it and it was returned when we submitted the mutation. The ‘jedi1’ identifier is not saved by Dgraph.
You can easily decide to add a triple to the transaction to save the “fact” that jedi1 is an identifier for you.
So simply add
<_:jedi1> <identifier> "jedi1" .
Note: the convention is to use “xid” for external id as the predicate.
What if I run the mutation again ?
If you submit the mutation
{
set {
<_:sith1> <character_name> "Darth Vador".
<_:jedi1> <character_name> "Luke Skywalker" .
<_:sith1> <has_for_child> <_:jedi1> .
}
}
Again, Dgraph will see temporary identifiers and so will generate new entities with new internal ids for them. You may want to avoid creating duplicate information. In this case you will have to check the existence of the entities, using the external id or any criteria, before adding the information. This is done with an upsert mutation.
flexible
mode, facts can be added without declaring a schema, making Dgraph storage extremely flexible. We will see why one would eventually need a schema in the next episode.entities
having attributes
and relations
between the entities.nodes
and edges
: the kind of drawing we do when we sketch relations between things.References
https://www.w3.org/TR/n-quads/
Photo by cottonbro studio