At the beginning are the triples

An introduction to Dgraph core concepts through hands-on.

The hands-on examples are a way to better understand each concept by experiencing directly with Dgraph. They are not a substitute for product documentation.

Prerequisite

Get a Dgraph cluster up and running

Self-managed cluster

You can perform all the steps of this post using a local Learning Environment with a Dgraph instance, and Ratel UI running in docker containers.

Use a Dgraph Cloud cluster

You can start in seconds by provisioning a Dgraph Cloud instance

In the Dgraph Cloud console, click Launch new backend.

  • Select a Dedicated instance and the region that meets your requirements.
  • Type a name for your Dgraph cloud instance.
  • Click Launch

For this blog, we will work without schema (more on this in the next episode) so we need to set the schema mode to flexible. This setting is only available to Dedicated clusters.

When your cluster is created, in the Dgraph cloud console, click Settings and set Schema mode to flexible.

Configure 'flexible' schema.

Access Ratel UI

Ratel is a graphical data visualization tool. On your cloud instance access “Ratel” in the left Menu.

Other tools

We will use curl client and the JSON processor jq to properly display query results.

Episode 1 - At the beginning are the triples

Dgraph is all about interconnected data. The W3C uses the term “Semantic Web” to refer to the Web of linked data and RDF is one the Semantic Web Standards for data interchange. The RDF 1.1 introduced a very simple yet powerful representation allowing structured and semi-structured data to be mixed, exposed, and shared across different applications. Its main focus is to name the relationships between things as well as the two ends of the link (this is usually referred to as a “triple”). Dgraph is using a simplified version of the standard. It is so simple that it takes the form of a line of text with 4 elements and a final dot, all separated by a space. It is important to understand how powerful and transformative this simple approach is as it is one of the underlying principles of Dgraph.

Let’s look at an example.

<_:jedi1> <character_name> "Luke Skywalker" .
<_:leia> <character_name> "Leia" .
<_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
<_:sith1> <has_for_child> <_:jedi1> .
<_:sith1> <has_for_child> <_:leia> .

The 4 elements of the notation are

  • an identifier of a ‘thing’ we are talking about : the subject,
  • a predicate,
  • a literal value, or an identifier of another ‘thing’ : the object
  • an optional list of characteristics associated with the predicate: the facets

Those lines could be read as

  • There is a first thing, that we refer to as ‘jedi1’, having a character_name “Luke Skywalker”,
  • There is a second thing, that we refer to as ‘leia’, having a character_name “Leia”,
  • There is a thing that we refer to as ‘sith1’, having a character_name “Anakin”. The character_name of ‘sith1’ has a characteristic ‘aka’ equal to “Darth Vador” and “villain” equal true.
  • The thing referred to as ‘sith1’ has a relation has_for_child with the thing referred to as ‘jedi1’.
  • The thing referred to as ‘sith1’ has a relation has_for_child with the thing referred to as ‘leia’.

Comments

  1. We can see those simple lines as a list of facts. They represent certain information and knowledge (at one point in time it was even a revelation). We will save those facts directly in Dgraph. As you can store facts aka knowledge in Dgraph as a graph, the term “knowledge graph” is sometimes used.

  2. We have used the term thing for the subject because nothing is enforcing a specific semantic for the subject. As a generic term, we prefer node or entity rather than thing.

  3. The notation _:jedi1. It is called a blank node in the RDF specification. It is a temporary identifier of the node. It means that we don’t have a better way to reference the node we are talking about, but as we need to reference the same node in the next lines, as subject or object, we just refer to it as <_:jedi1> in this group of lines.

  4. The object part may be an entity <:sith1> <has_for_child> <:jedi1>. In that case it’s natural to see the predicate as a relationship. The object part may be a literal value. <_:jedi1> “Luke Skywalker”. In that case, we understand the predicate has an attribute of the subject node.

Let’s play with Dgraph

Add data using a mutation

In Dgraph Query Language (DQL), operations modifying the data (add, delete or update) are called mutations ; operations reading the data are simply called queries.

You may use one of the DQL client available to execute mutations and queries. We will illustrate this post using the raw HTTP client and Ratel UI.

Add data using a HTTP client

curl "localhost:8080/mutate?commitNow=true" \
-s -H "Content-Type: application/rdf" -X POST  -d $'
{
  set {
    <_:jedi1> <character_name> "Luke Skywalker" .
    <_:leia> <character_name> "Leia" .
    <_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
    <_:sith1> <has_for_child> <_:jedi1> .
    <_:sith1> <has_for_child> <_:leia> .
  }
}
' | jq

Add data using a mutation in Ratel Alternatively, you can use Ratel Console: select Mutate tab

Mutate data in Ratel

and enter

{
  set {
    <_:jedi1> <character_name> "Luke Skywalker" .
    <_:leia> <character_name> "Leia" .
    <_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
    <_:sith1> <has_for_child> <_:jedi1> .
    <_:sith1> <has_for_child> <_:leia> .
  } 
}

and hit RUN.

Check the JSON response (in Ratel, select the response tab)

{
  "data": {
    "code": "Success",
    "message": "Done",
    "queries": null,
    "uids": {
      "jedi1": "0x1",
      "leia": "0x2",
      "sith1": "0x3"
    }
  },
  ...

Dgraph has successfully saved the facts and it also tells us that it has given unique identifiers for the blank nodes that we have provided. We can use those identifiers to add or change facts about the entities.

Just copy the jedy1 identifier ( 0x01 in this our example) And run another mutation.

curl "localhost:8080/mutate?commitNow=true" \
-s -H "Content-Type: application/rdf" -X POST  -d $'
{
  set {
    <0x1> <eye_color> "blue" .
  } 
}
' | jq

In Ratel, just copy/paste the following mutation in the mutation tab amd hit RUN.

{
  set {
    <0x1> <eye_color> "blue".
  } 
}

It’s time to retrieve information from Dgraph using a query.

Query in Ratel Select Query and copy-paste the request and hit RUN:

{
   characters(func:has(character_name)) {
      character_name @facets
      eye_color
      has_for_child { character_name }
  }   
}

Select the Graph tab to display the result … Et voilà.

Triples represented as a Graph.

Your first graph shows 3 entities and two relations.

If needed, move the nodes in the visualization to better see the relation name.

Select Luke to display the panel with the attributes for this node.

As you are curious, click on the JSON tab, it displays a JSON format of the query response :

{
  "data": {
    "characters": [
      {
        "character_name": "Luke Skywalker",
        "eye_color": "blue"
      },
      {
        "character_name": "Leia"
      },
      {
        "character_name|aka": "Darth Vador",
        "character_name|vilain": true,
        "character_name": "Anakin",
        "has_for_child": [
          {
            "character_name": "Luke Skywalker"
          },
          {
            "character_name": "Leia"
          }
        ]
      }
    ]
  }
,...

Query using HTTP Client

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{
   characters(func:has(character_name)) {
      character_name @facets
      eye_color
      has_for_child { character_name }
       
  }
     
}' | jq

The command outputs the result as a JSON structure.

We will dig into that later but the most remarkable point here is that the response has exactly the structure of the query. It makes it a very powerful tool for client applications as they always know the structure of the response even with dynamically created queries. This capability is referred to as being “declarative” : we declare what we are interested in.

This query can be understood as

  • Build a list called ‘characters’ with all the entities having a predicate character_name.
  • Tell me the character_name of the found entities with all the attached characteristics (facets).
  • I know that such entities may have information about eye_color so give me that info too.
  • I’m also interested in the has_for_child predicate. If it exists it links to another entity and I want to know the character_name of that entity.

Questions

What happened to my identifier _:jedi1 ?

<_:jedi1> was a temporary identifier. It is valid in the context of the transaction: all RDF lines in the same transaction referencing <_:jedi1> are referencing the same entity. Dgraph has generated a unique id for it and it was returned when we submitted the mutation. The ‘jedi1’ identifier is not saved by Dgraph. You can easily decide to add a triple to the transaction to save the “fact” that jedi1 is an identifier for you. So simply add

<_:jedi1> <identifier> "jedi1" .

Note: the convention is to use “xid” for external id as the predicate.

What if I run the mutation again ?

If you submit the mutation

{
  set {
    <_:sith1> <character_name> "Darth Vador".
    <_:jedi1> <character_name> "Luke Skywalker" .
    <_:sith1> <has_for_child> <_:jedi1> .
  } 
}

Again, Dgraph will see temporary identifiers and so will generate new entities with new internal ids for them. You may want to avoid creating duplicate information. In this case you will have to check the existence of the entities, using the external id or any criteria, before adding the information. This is done with an upsert mutation.

What we have learned.

  • Dgraph handles data as a network of objects with materialized links between them. This makes Dgraph the preferred choice for managing highly interconnected data.
  • One way to inject information into Dgraph is to simply describe facts in the form of RDF triples and to send a mutation request. Dgraph extends the triples with the notion of ‘facets’ which are characteristics attached to a predicate.
  • In flexible mode, facts can be added without declaring a schema, making Dgraph storage extremely flexible. We will see why one would eventually need a schema in the next episode.
  • The concepts used in the RDF model: subject - predicate - object, translate naturally for humans in entities having attributes and relations between the entities.
  • An intuitive visualization is a graph with nodes and edges : the kind of drawing we do when we sketch relations between things.
  • Dgraph offers a querying language to retrieve the knowledge stored in the graph DB with a predictive response in JSON format.

References

https://www.w3.org/TR/n-quads/

Photo by cottonbro studio