Variable propagation is a powerful technique in programming and algorithms. It optimizes performance by reducing unnecessary recalculations and improving overall efficiency. This concept refers to passing known variable values across various components of an algorithm, i.e. “propagating” the values of variables so that they can be reused whenever needed, without recomputing them.
In Dgraph’s Query Language (DQL), variable propagation is used to optimize query performance and make graph traversals more efficient. In Dgraph, variable propagation refers to the ability to assign intermediate results to variables within a query and then reuse those variables later in the same query. Instead of re-traversing the same graph nodes, DQL allows you to propagate variables throughout the query, ensuring more efficient and readable code.
Variable propagation is especially useful in recommendation systems, and content-based filtering. Refer to the blog post Build a Realtime Recommendation Engine: Part 1 for details.
In those types of use cases, we are searching for similar sets. For example, how do you compare the list of friends of user A (set A) and the list of friends of user B (set B). For two sets A and B the Jaccard distance is given by
$$1-\frac{|A\cap B|}{| A\cup B |}$$
But let’s start with something simpler to understand variable propagation in DQL.
In many social network analyses, we have to find the persons we have the most mutual ‘follows’, ‘products’, ‘contacts’, etc. with.
For example, the pattern appears in product recommendations based on the user shopping cart
content and other users’ past purchases:
In this case, finding the persons I have the most common products with can be used to recommend the product they bought and I don’t have yet. We have introduced different relationships to illustrate that the graph traversal is not necessarily limited to one relation.
The same pattern can be applied in numerous use case. Let’s see the DQL details for a use case related to persons
, the persons they follow
and the persons they have in their contacts
using phonenumber
.
Perhaps the most common use case in ‘social network’, let’s find the persons I have the most common ‘follows’ with.
Assuming that
Persons
are identified by a username
follows
in Dgraph with a reverse relationship ~follows
.
Refer to Dgraph documentation about reverse edges.Here is a DQL query finding the person having the most common ‘follows’ with person A.
{
userA as var(func: eq(username, "A")) {
# use a named variable userA to be able to exclude this node later in the query
c as math(1) # start c =1 on user A Node
follows_of_userA as follows {
# c is propagated, each follow is reached one time so c =1 for every follow
~follows @filter(NOT uid(userA)) {
# ~follows is the reverse relationship
# users at this level are reached by all the common follows,
# c = sum all path = count of common follows
# keep the value in a variable,
# in Dgraph a variable is a map uid -> value, so we have the count for every target
mutual_follows as math(c)
}
}
}
target_user(func: uid(mutual_follows), orderdesc: val(mutual_follows), first:1) {
username
mutual_follows_count: val(mutual_follows)
mutual_follows: follows @filter(uid(follows_of_userA)) {
username
}
}
}
The variable we ‘propagate’ is c
. We initialize it at 1
on the userA node.
When used in a block nested within the block that defines the variable, the value is computed as a sum of the variable for parent nodes along all paths to the point of use.
We use c
variable in the block ~follows @filter(NOT uid(userA))
.
The graph query can reach this block by all the common follows
so the value of c
in this block is the sum of all common follows
.
We store the value in the value variable mutual_follows
.
A value variable is a map from the UIDs (node identfier) of the enclosing block to the corresponding values, so we have the count of common follows
for all persons with at least one common follows
.
We use this value variable to sort it by value and take the first node, which is the person we are looking for.
Note that we can change the last block to get all persons with at least 5 common follows
using a filter:
...
target_user(func: uid(mutual_follows), orderdesc: val(mutual_follows)) @filter(gt(val(mutual_follows),5)) {
username
mutual_follows_count: val(mutual_follows)
mutual_follows: follows @filter(uid(follows_of_userA)) {
username
}
}
}
In the case of follows
, we have only one relationship to traverse the graph.
We can apply the exact same logic to more complex graph traversal.
Let’s find the persons with the most contacts
in common:
Assuming that
Persons
are identified by an attribute username
Phonenumber
nodes have an attribute phone_number
belongs_to
has a reverse ~belongs_to
has_in_contacts
has a reverse ~has_in_contacts
{
var(func: eq(username, "A")) {
c as math(1)
userA_phone_number as ~belongs_to {
userA_contacts as has_in_contacts {
~has_in_contacts @filter(NOT uid(userA_phone_number)) {
belongs_to{
mutual_contacts as Math(c)
}
}
}
}
}
target_user(func: uid(mutual_contacts), orderdesc: val(mutual_contacts), first: 1) {
username
mutual_contact_count:val(mutual_contacts)
phone:~belongs_to {
phone_number
mutual_contacts: has_in_contacts @filter(uid(userA_contacts)) {
phone_number
belongs_to {
username
}
}
}
}
}
The pattern is the same as the follows
use case. We just had to adapt the query to traverse the graph as we need. This time we needed to go from user
to phonenumber
node and from phonenumber
to contacts
and reverse.
The organization of DQL query in block makes it easy to combine the result of different variables to compute a score. Imagine we want to score ‘similar’ users to a given user by a formula
# Example of similarity formula
similarity_score = 0.4 * number of common 'follows' + 0.6 * number of common 'contacts'
We can use a DQL query merging the blocks above
{
userA as var(func: eq(username, "A")) {
c as math(1) # start c =1 on user A Node
# first block to compute mutual follows using variable propagation
follows_of_userA as follows {
~follows @filter(NOT uid(userA)) {
mutual_follows as math(c)
}
}
# second block to compute mutual contacts using same variable !
# it propagates on a different path.
userA_phone_number as ~belongs_to {
userA_contacts as has_in_contacts {
~has_in_contacts @filter(NOT uid(userA_phone_number)) {
belongs_to{
mutual_contacts as Math(c)
}
}
}
}
}
# compute a score using the formula
var(func: uid(mutual_follows, mutual_contacts)) {
score as math(0.4 * mutual_follows + 0.6 * mutual_contacts)
}
# get target info
target(func: uid(score), orderdesc: val(score), first: 1) {
username
score: val(score)
count_mutual_follows: val(mutual_follows)
count_mutual_contacts: val (mutual_contacts)
}
}
There are usually different way to traverse a graph and get the information you need.
For the initial request about follows
, here is a way to find the result without variable aggregation:
{
userA as var(func: eq(username, "A")) {
follows_of_userA as follows {
~follows @filter(NOT uid(userA)) {
other_users as uid
}
}
}
var(func: uid(other_users)) {
mutual_follows as count(follows @filter(uid(follows_of_userA)))
}
target_user(func: uid(mutual_follows), orderdesc: val(mutual_follows), first: 1) {
username
mutual_follows_count: val(mutual_follows)
mutual_follows: follows @filter(uid(follows_of_userA)) {
username
}
}
}
The second block of this query is doing the count, but it traverse the graph again.
For this use case, the query using variable propagation has proven to be 10 times faster even with 10 thousands users.
Variable propagation in Dgraph is a powerful feature that allows for
Query Optimization: By reusing the results of previous computations (stored in variables), Dgraph ensures that graph nodes aren’t traversed multiple times unnecessarily. This leads to faster query execution, especially when dealing with large datasets.
Code Clarity: Propagating variables simplifies the query structure, making it easier to follow the logic of complex traversals.
Efficiency: In graph databases, efficiency is critical due to the often exponential growth in graph size. Variable propagation in Dgraph helps in limiting the scope of operations by directly referencing previously computed nodes, reducing the overall complexity of the query.
Whether you’re building complex graph traversals or simplifying your query logic, understanding how to leverage variable propagation in DQL will help you get the most out of Dgraph.
Photo by cottonbro studio