Cracking DQL: Variable Propagation

Variable Propagation in Algorithms

Variable propagation is a powerful technique in programming and algorithms. It optimizes performance by reducing unnecessary recalculations and improving overall efficiency. This concept refers to passing known variable values across various components of an algorithm, i.e. “propagating” the values of variables so that they can be reused whenever needed, without recomputing them.

Variable Propagation in Dgraph

In Dgraph’s Query Language (DQL), variable propagation is used to optimize query performance and make graph traversals more efficient. In Dgraph, variable propagation refers to the ability to assign intermediate results to variables within a query and then reuse those variables later in the same query. Instead of re-traversing the same graph nodes, DQL allows you to propagate variables throughout the query, ensuring more efficient and readable code.

Variable propagation is especially useful in recommendation systems, and content-based filtering. Refer to the blog post Build a Realtime Recommendation Engine: Part 1 for details.

In those types of use cases, we are searching for similar sets. For example, how do you compare the list of friends of user A (set A) and the list of friends of user B (set B). For two sets A and B the Jaccard distance is given by

$$1-\frac{|A\cap B|}{| A\cup B |}$$

But let’s start with something simpler to understand variable propagation in DQL.

Most mutual ’things'

In many social network analyses, we have to find the persons we have the most mutual ‘follows’, ‘products’, ‘contacts’, etc. with.

For example, the pattern appears in product recommendations based on the user shopping cart content and other users’ past purchases:

find who I have the most common products with.

In this case, finding the persons I have the most common products with can be used to recommend the product they bought and I don’t have yet. We have introduced different relationships to illustrate that the graph traversal is not necessarily limited to one relation.

The same pattern can be applied in numerous use case. Let’s see the DQL details for a use case related to persons, the persons they follow and the persons they have in their contacts using phonenumber.

Common ‘follows’

Perhaps the most common use case in ‘social network’, let’s find the persons I have the most common ‘follows’ with.

find the users (Bs), A has the most mutual 'follows' with

Assuming that

  • Persons are identified by a username
  • we have a relationship follows in Dgraph with a reverse relationship ~follows. Refer to Dgraph documentation about reverse edges.

Here is a DQL query finding the person having the most common ‘follows’ with person A.

{
  userA as var(func: eq(username, "A")) { 
  # use a named variable userA to be able to exclude this node later in the query
    c as math(1) # start c =1 on user A Node
    follows_of_userA as follows {
        # c is propagated, each follow is reached one time so c =1 for every follow
      ~follows @filter(NOT uid(userA)) {
        # ~follows is the reverse relationship
        # users at this level are reached by all the common follows, 
        # c = sum all path = count of common follows
        # keep the value in a variable, 
        # in Dgraph a variable is a map uid -> value, so we have the count for every target
                mutual_follows as math(c)
      }
    }
  }
    
  target_user(func: uid(mutual_follows), orderdesc: val(mutual_follows), first:1) {
    username
    mutual_follows_count: val(mutual_follows)
    mutual_follows: follows @filter(uid(follows_of_userA)) {
      username
    }
  }
}

The variable we ‘propagate’ is c. We initialize it at 1 on the userA node. When used in a block nested within the block that defines the variable, the value is computed as a sum of the variable for parent nodes along all paths to the point of use.

We use c variable in the block ~follows @filter(NOT uid(userA)). The graph query can reach this block by all the common follows so the value of c in this block is the sum of all common follows.

We store the value in the value variable mutual_follows.

A value variable is a map from the UIDs (node identfier) of the enclosing block to the corresponding values, so we have the count of common follows for all persons with at least one common follows.

We use this value variable to sort it by value and take the first node, which is the person we are looking for.

Note that we can change the last block to get all persons with at least 5 common follows using a filter:

...
  target_user(func: uid(mutual_follows), orderdesc: val(mutual_follows))  @filter(gt(val(mutual_follows),5)) {
    username
    mutual_follows_count: val(mutual_follows)
    mutual_follows: follows @filter(uid(follows_of_userA)) {
      username
    }
  }
}

Common ‘contacts’

In the case of follows, we have only one relationship to traverse the graph. We can apply the exact same logic to more complex graph traversal.

Let’s find the persons with the most contacts in common:

find persons (Bs) A has the most contacts in common.

Assuming that

  • Persons are identified by an attribute username
  • Phonenumber nodes have an attribute phone_number
  • the relationhsip belongs_to has a reverse ~belongs_to
  • the relationship has_in_contacts has a reverse ~has_in_contacts
 
{
  var(func: eq(username, "A")) {
    c as math(1)
    userA_phone_number as ~belongs_to {
      userA_contacts as has_in_contacts {
        ~has_in_contacts @filter(NOT uid(userA_phone_number)) {
          belongs_to{
              mutual_contacts as Math(c)
          }
        }
      }
    }
  }
  
  
  target_user(func: uid(mutual_contacts), orderdesc: val(mutual_contacts), first: 1) {
    username
    mutual_contact_count:val(mutual_contacts)
    phone:~belongs_to {
      phone_number
      mutual_contacts: has_in_contacts @filter(uid(userA_contacts))  {
        phone_number 
        belongs_to {
          username
        }
      }
    }
  }
}

The pattern is the same as the follows use case. We just had to adapt the query to traverse the graph as we need. This time we needed to go from user to phonenumber node and from phonenumber to contacts and reverse.

Scoring function

The organization of DQL query in block makes it easy to combine the result of different variables to compute a score. Imagine we want to score ‘similar’ users to a given user by a formula

# Example of similarity formula
similarity_score = 0.4 * number of common 'follows' + 0.6 * number of common 'contacts'

We can use a DQL query merging the blocks above

{
  userA as var(func: eq(username, "A")) { 
    c as math(1) # start c =1 on user A Node
    # first block to compute mutual follows using variable propagation
    follows_of_userA as follows {
      ~follows @filter(NOT uid(userA)) {
                mutual_follows as math(c)
      }
    }
    # second block to compute mutual contacts using same variable !
    # it propagates on a different path.
    userA_phone_number as ~belongs_to {
      userA_contacts as has_in_contacts {
        ~has_in_contacts @filter(NOT uid(userA_phone_number)) {
          belongs_to{
              mutual_contacts as Math(c)
          }
        }
      }
    }
  }

# compute a score using the formula
  var(func: uid(mutual_follows, mutual_contacts)) {
    score as math(0.4 * mutual_follows + 0.6 * mutual_contacts)
  }
# get target info
  target(func: uid(score), orderdesc: val(score), first: 1) {
    username
    score: val(score)
    count_mutual_follows: val(mutual_follows)
    count_mutual_contacts: val (mutual_contacts)
  }
}

Performance

There are usually different way to traverse a graph and get the information you need. For the initial request about follows, here is a way to find the result without variable aggregation:

{
  userA as var(func: eq(username, "A")) {
    follows_of_userA as follows {
      ~follows @filter(NOT uid(userA)) {
        other_users as uid
      }
    }
  }

  var(func: uid(other_users)) {
    mutual_follows as count(follows @filter(uid(follows_of_userA)))
  }

  target_user(func: uid(mutual_follows), orderdesc: val(mutual_follows), first: 1) {
    username
    mutual_follows_count: val(mutual_follows)
    mutual_follows: follows @filter(uid(follows_of_userA)) {
      username
    }
  }
}

The second block of this query is doing the count, but it traverse the graph again.

For this use case, the query using variable propagation has proven to be 10 times faster even with 10 thousands users.

Conclusion

Variable propagation in Dgraph is a powerful feature that allows for

  • Query Optimization: By reusing the results of previous computations (stored in variables), Dgraph ensures that graph nodes aren’t traversed multiple times unnecessarily. This leads to faster query execution, especially when dealing with large datasets.

  • Code Clarity: Propagating variables simplifies the query structure, making it easier to follow the logic of complex traversals.

  • Efficiency: In graph databases, efficiency is critical due to the often exponential growth in graph size. Variable propagation in Dgraph helps in limiting the scope of operations by directly referencing previously computed nodes, reducing the overall complexity of the query.

Whether you’re building complex graph traversals or simplifying your query logic, understanding how to leverage variable propagation in DQL will help you get the most out of Dgraph.

Photo by cottonbro studio