Schema or no schema?

This document is the second episode of a series to introduce Dgraph concepts through hands-on, and is covering the notion of Dgraph Schema.

The series already covered:

The hands-on examples are a way to better understand each concept by experiencing directly with Dgraph. They are not a substitute for product documentation.

Prerequisite

Follow Episode 1 to get a Dgraph instance up and running and load data.

You can perform all the steps of this post using a local Learning Environment with a Dgraph instance, and Ratel UI running in docker containers.

Sample data

We are continuing the hands-on with the data loaded in Episode 1.

In case you need to re-create the data run the following mutation from a terminal window.

curl "localhost:8080/mutate?commitNow=true" \
-s -H "Content-Type: application/rdf" -X POST  -d $'
{
  set {
    <_:jedi1> <character_name> "Luke Skywalker" .
    <_:jedi1> <eye_color> "blue" .
    <_:leia> <character_name> "Leia" .
    <_:sith1> <character_name> "Anakin" (aka="Darth Vador",villain=true) .
    <_:sith1> <has_for_child> <_:jedi1> .
    <_:sith1> <has_for_child> <_:leia> .
  }
}
' | jq

Episode 2 - Schema or no schema

We will now do some basic queries to understand why we would need to tell Dgraph more about the predicates and nodes in the form of schema metadata.

Query functions and indexes

Let’s find the list of entities having the predicate eye_color equal “blue” and give the character_name of those entities.

We will explain the query language syntax in episode 3, so just use the following query :

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   characters(func:eq(eye_color,"blue")) {
       uid
       character_name
   }
}' | jq

The response is an error:

{
  "errors": [
    {
      "message": ": Predicate eye_color is not indexed",
      "extensions": {
        "code": "ErrorInvalidRequest"
      }
    }
  ],
  "data": null
}

If you are using Ratel UI, the Error tab of the result panel, should display

Message: : Predicate eye_color is not indexed

Dgraph is complaining that the predicate eye_color is not indexed. An index is required to be able to use certain query functions.

Let’s try another query to find the list of entities having the predicate character_name containing the term “Luke” and give the entity uid, the character_name and eye_color of those entities.

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   characters(func:anyofterms(character_name,"Luke")) {
       uid
       character_name
       eye_color
   }
}' | jq

In this case, the error is more specific

Message: : Attribute character_name is not indexed with type term

In order to use the function ‘anyofterms’ on a predicate, Dgraph is expecting a specific index type named term.

The function documentation specifies which kind of index is needed by each function.

Certain query functions require specific index types.

So let’s add indexes by pushing a Dgraph schema to the /alter endpoint:

curl "localhost:8080/alter" --silent --request POST \
  --data $'
character_name: string @index(term) .
eye_color: string @index(hash) .
' | jq

The response should be:

{
    "data": {
        "code": "Success",
        "message": "Done"
    }
}

At this stage, the Dgraph schema is simply a list of predicate names with predicate type and indexes, in the following syntax:

character_name: string @index(term) .
eye_color: string @index(hash) .

We can now re-run the queries.

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   characters(func:eq(eye_color,"blue")) {
       uid
       character_name
   }
}' | jq

The JSON tab of the Result panel should display the result

{
  "data": {
    "characters": [
      {
        "uid": "0x1",
        "character_name": "Luke Skywalker"
      }
    ]
  },
  ...

The uid may differ on your system. It is an internal unique id generated by Dgraph.

And the second query

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   characters(func:anyofterms(character_name,"Luke")) {
       uid
       character_name
       eye_color
   }
}' | jq

will result in

{
  "data": {
    "characters": [
      {
        "uid": "0x1",
        "character_name": "Luke Skywalker",
        "eye_color": "blue"
      }
    ]
  },
  ...

With the proper indexes declared in the Dgraph schema, the queries are working as expected!

Entity Types

We noticed that you can always add a fact about an entity using a mutation. We did that in Part1 when adding :

{
  set {
    <0x01> <eye_color> "blue".
  } 
}

We don’t need to tell Dgraph what the entity <0x01> is. In that sense, triples are schema-less and very flexible.

However, there are two use cases where knowing the expected predicates of a given entity will help:

  • when deleting all facts about a given entity.
  • when retrieving all the predicates of an entity: “tell me all you know about the node <0x01>”

The later can be done with a query using the expand function:

First let’s get the internal ID of our entities:

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{
   characters(func:has(character_name)) {
      character_name 
      uid
  }
     
}' | jq

The result is a list of all entities having a character_name

{
  "data": {
    "characters": [
      {
        "character_name": "Luke Skywalker",
        "uid": "0x1"
      },
      {
        "character_name": "Leia",
        "uid": "0x2"
      },
      {
        "character_name": "Anakin",
        "uid": "0x3"
      }
    ]
  },
  ...

Note the uid for “Luke”, in our example “0x1”.

Replace “0x1” by the correct uid in the following queries.

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   character(func:uid(0x1)) {
       expand(_all_)
   }
}' | jq

At this point, the result is empty: Dgraph can find the entity but does not know what to expand, i.e the list of predicates for this entity.

{
  "data": {
    "character": []
  },
  ...

We have the same issue with the delete operation. Deleting an entity is done by deleting everything Draph knows about this entity. This is done with a mutation using wildcard delete.

curl "localhost:8080/mutate?commitNow=true"  -s \
-H "Content-Type: application/rdf" \
-X POST \
--data '
{ 
   delete {
     <0x01> * * .
   }
}' | jq

replace 0x01 by the uid

The mutation is ‘Done’, but a simple query will show that the entity is still there:

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   character(func:uid(0x1)) {
       character_name
   }
}' | jq

The delete operation using the wildcard, did not delete the predicates. In order to produce the expected result, Dgraph should know the list of the predicates for this entity.

This is the role of of Types in Dgraph Schema.

Let’s define a type Character with the list of predicates a Character may have:

curl "localhost:8080/alter" --silent --request POST \
  --data $'
character_name: string @index(term) .
eye_color: string @index(hash) .
has_for_child: [uid] .

type Character {
  character_name
  eye_color
  has_for_child
}
' | jq

Verify that the result is a “Success”:

{
  "data": {
    "code": "Success",
    "message": "Done"
  }
}

We just updated the Dgraph schema and declared that an entity of type Character may have facts about character_name, eye_color, has_for_child.

We need to tell Dgraph that our entities are of type Character. To do that we save a fact, i.e a triple, using the reserved predicate dgraph.type.

curl "localhost:8080/mutate?commitNow=true" \
-s -H "Content-Type: application/rdf" -X POST  -d $'
upsert {
  query {
    characters as var(func: has(character_name))
  }

  mutation  {
    set {
      uid(characters) <dgraph.type> "Character" .
    }
  }
}' | jq

At this point, every entity that has a characer_name is of Type Character and Dgraph knows the predicates for type Character.

We can re-test our query using expand:

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   character(func:uid(0x1)) {
       expand(_all_)
   }
}' | jq

The result should now provides the predicates values:

{
  "data": {
    "character": [
      {
        "eye_color": "blue",
        "character_name": "Luke Skywalker"
      }
    ]
  },

Let’s delete all the facts about this entity in a mutation

curl "localhost:8080/mutate?commitNow=true"  -s \
-H "Content-Type: application/rdf" \
-X POST \
--data '
{ 
   delete {
     <0x01> * * .
   }
}' | jq

and re-run the query

curl "localhost:8080/query"  -s \
-H "Content-Type: application/dql" \
-X POST \
--data '
{ 
   character(func:uid(0x1)) {
        expand(_all_)
   }
}' | jq

to verify that Dgraph has no information anymore about this entity.

What we have learned

  • An index is required to be able to use certain filter functions when executing a query.
  • Dgraph supports different types of indexes that must be selected depending on the functions we want to use.
  • We specify predicates types and indexes in Dgraph schema (aka DQL schema).
  • We can modify the Dgraph schema and create indexes at any time when we see the need.
  • To use the ’expand’ function or to be able to delete all predicates of an entity using a wildcard, we need to
    • define a type and list all predicates for this type in the Dgraph schema.
    • use the predicate ‘dgraph.type’ to declare the type of entities.

Notes

It is a best practice to create a schema with proper indexes before saving a large volume of facts. Indexes will improve data ingestion performance.

Photo by cottonbro studio