Building a Stack Overflow Clone with Dgraph, and React

I have recently built a Stack Overflow clone with Dgraph and React. I was delightfully surprised by the pleasant developer experience and the performance of my application. In this post, I would like to tell the story of how I built Graphoverflow and share the best practices I learned for using Dgraph to build a modern web application.

As you can see in the live demo, Graphoverflow implements all core functionalities of Stack Overflow. Through the web interface, you can create, read, update, and delete questions, answers, comments. You can also cast upvotes and downvotes. In addition, it is smart enough to recommend you questions to read based on collaborative filtering. In many ways, Graphoverflow successfully embodies many important features that most modern web applications depend on: CRUD operations, and user authentication and authorization. As the source code shows, all this is achieved by using Dgraph as the primary and only data storage.

Looking back the past three weeks of building Graphoverflow, I feel that the journey was unexpectedly simple and straightforward. I have never built anything using a graph database in my entire life. Therefore when I started building this application I knew I was set out for a bumpy ride. However, the intuitive query language of Dgraph did the heavy lifting out of the box, and I did not have to struggle too much. Graphoverflow is not a simple application by any means, as it needs to retrieve and render a large amount of data with complex relationships, and has a built-in recommendation system. However, a quick analysis of the code base reveals that I only had to write 700 LOC (18%) on the server side, and 3300 LOC (82%) on the client side to ship it, excluding

# Server side
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JavaScript                       7             94             21            654
JSON                             1              0              0             39
-------------------------------------------------------------------------------
SUM:                             8             94             21            693
-------------------------------------------------------------------------------


# Client side
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JavaScript                      62            391             74           2880
Sass                            17             91              1            461
-------------------------------------------------------------------------------
SUM:                            79            482             75           3341
-------------------------------------------------------------------------------

The simplicity of the code base clearly demonstrates that, with Dgraph, you can end up with a lot less complex application code for your non-trivial requirements.

The Query Language Was The Key

I believe that Graphoverflow owes its simple code base and the fast iteration cycle to Dgraph’s homebred query language, Graphql+-. To me, the biggest benefit of the query language was that I was able to retrieve complex data by writing a single, and intuitive tree-like structure. I could spend less time worrying about what tables to create or join together to store and fetch the data that the frontend required. Instead, I could simply focus on polishing my front-end components, while relying on Dgraph’s flexible schema system and its powerful query language.

GraphOverflow question

The inner workings of the question page shown above can shed lights on Dgraph’s ability to retrieve complex data in a straightforward way. In order to render this page on the front-end, I needed to fetch question, and all of its answers. For all of them, I needed to fetch the total number of upvotes and downvotes, viewCounts, its history, comments, and those comments’ respective scores. Not only that but also I needed to fetch 30 related questions based on the tags attached to the displayed question. These data have complex schemas and interdependent relationships with one another. The queries I wrote, on the other hand, felt very straightforward to me.

{
  question(func: uid(0xa421) {
    Title {
      Text
    }
    Body {
      Text
    }
    ViewCount
    UpvoteCount: count(Upvote)
    DownvoteCount: count(Downvote)

    Comment {
      ...
    }

    Has.Answer {
      Comment {
        ...
      }
      ...
    }
    ...
  }
}

The above is the query responsible for fetching the question, its answers, and their relevant properties and children such as comments. GraphQL+- is comprised of the root node and its nested blocks, and when executed it returns a result in a subgraph format. A query can have multiple root nodes, and you can see the whole query here on GitHub. In short, the above query returns a result shaped like the following code block. We can see that starting from the root node, question, Dgraph finds and returns all the children nodes represented by nested blocks.

{
  "question": [
    {
      "Timestamp": "2015-10-06T15:34:33.75Z",
      "Title": [
        {
          "Text": "How can I keep my cat off my keyboard?"
        }
      ]
      "Body": [
        {
          "Text": "..."
        }
      ],
      "ViewCount": 21767,
      "UpvoteCount": 169,
      "DownvoteCount": 2,
      "Comment": [ { ... }, ... ],
      "Has.Answer": [
        {
          "Comment": [ { ... }, ... ],
          ...
        },
        ...
      ],
    }
  ]
}

Warning

Query syntax and features has changed in v1.0 release. For the latest syntax and features, visit link to docs.

Using in React

To query Dgraph and use the result in a React application, we can use the component lifecycle method provided by React. When a component responsible for displaying the question mounts to the DOM, Graphoverflow sends a query to Dgraph and displays a loading screen for a short while until the result arrives. Therefore, it handles the data fetching in the compnentDidMount lifecycle method which is invoked immediately after a component mounts. The simplified code looks like the following:

import React from 'react';
import { runQuery } from "../lib/helpers";
import { getQuestionQuery } from "../queries/Question";

class Question extends React.Component {
  constructor(props) {
    super(props);

    this.state = {
      questionLoaded: false,
      question: {}
    };
  }

  componentDidMount() {
    const questionUID = this.props.match.params.uid;
    const query = getQuestionQuery(questionUID);

    runQuery(query).then(res => {
      const question = res.question[0];
      this.setState({ question, questionsLoaded: true });
    })
  }

  ...
}

Three things need to happen in componentDidMount: constructing a query for the question, sending the query to the server, storing the result so that it can be rendered. Let us look at this simple process step-by-step.

  • Since we need to fetch a question with a specific _uid_, we need to dynamically make a query string by interpolating string. getQuestionQuery is a factory that takes a questionUID and returns a string representing the query.

    function getQuestionQuery(questionUID) {
      return `
        question(func: uid(${questionUID})) {
          ...
        }
      `;
    }
    
  • Once we have the query, simply make an HTTP POST request to Dgraph server. runQuery is a helper method that returns a promise that resolves with the JSON response from Dgraph server. It uses superagent, but you can surely use other solutions such as fetch.

import request from "superagent";

function runQuery(queryText) {
  const endpointBaseURL = 'http://localhost:3030';

  return request.post(`${endpointBaseURL}/query`).send(queryText).then(res => {
    return JSON.parse(res.text);
  });
}
  • When we get the response, we store it in an application state so that the data can be rendered in the browser. Graphoverflow persists the data in component’s state to keep the demonstration simple. However, you can easily integrate state management libraries such as Redux or MobX to this step.

Having completed the above process in componentDidMount, now we can render the question page. Here is what render method of Question component looks like in a simplified way.

render() {
  const {
    question,
    questionLoaded,
  } = this.state;

  if (!questionLoaded) {
    return <Loading />;
  }

  return (
    <div>
      <QuestionLayout question={question} />
    </div>
  );
}

Once I got the hang of the query language, it felt very natural using Dgraph with React simply because there was nothing new or paradigm-shifting. The process described above is exactly how I would use PostgreSQL or MongoDB in a purely client-side rendered Single Page Application.

The mechanism described here can be adapted to support more advanced use cases such as server-side rendering; we can move the whole process into a static method of the component that returns a promise, and call it on the server side and delay the render until a response is fetched and promise resolved. We can even integrate Redux with ease so that the app state can be rehydrated on the client side.

Best Practices

Through building Graphoverflow, I have come up with some practices that I believe will help keep your code base maintainable and save time. Those are based on my collective lessons from numerous trial, errors, and redesign attempts. Often a lack of documented practices is the biggest barrier hindering us from dipping our unfamiliar technologies and cultivating their benefits. The following practices will help you instantly get started with building fast modern web applications with Dgraph and React.

Organize Your Queries Along With Your Components

It is a good idea to couple together a component and all queries that component relies on for data. The reason is that you need to know the shape of JSON response from Dgraph server in order to render your components. And the shape of the response can vary for mainly two reasons. Names of predicates can change because a query can use an alias. Alternatively, a nested block in a query might be omitted either by need or an error. Those situations can suddenly take away fields that your React component assumes are present. If you have a chain of attribute getters and one of those attributes is missing, you will end up chaining a getter on an undefined. Prior to React 16, such scenario can lead to cryptic error messages that are hard to debug.

Here is an example of how a change in Dgraph response can cause an error in your React application. The query for the question page includes the following:

{
  question(func: uid(0xa421) {
    Title {
      Text
    }
    ...
  }
}

In the Question component, Graphoverflow renders the title while assuming the result has the shape of { question: { title: [ { text } ], ... } }:

class Question extends React.Component {
  ...
  render() {
    ...

    return (
      ...
      <h1 className="post-title">{post.Title[0].Text}</h1>
      ...
    );
  }
}

If we change the query to the following, the component will not work anymore because post.Title will be undefined.

{
  question(func: uid(0xa421) {
    question_title as Title {
      Text
    }
    ...
  }
}

Your queries will evolve as your requirements change. Iterating on your query can be error-prone if the queries are scattered in a random manner without a formal structure. Therefore I recommend the following directory structure in your project to mitigate such issue and keep your code base maintainable.

your_project
├── components
│   ├── EditPost.js
│   ├── Home.js
│   └── Question.js
├── queries
│   ├── EditPost.js
│   ├── Home.js
│   └── Question.js
...

Within your component, import needed queries from the corresponding query file. This way, we can easily iterate on queries for your components in a reliable manner. It is clean and reassuring to have a separate, go-to file to see all the queries that your component is relying on. It keeps possible errors in check and makes debugging feasible in case of an error.

Use Functions to Dynamically Construct Queries

When your query needs to be built dynamically, write a function that takes values and returns a query string using those values. In other words, make a factory to dynamically generate queries rather than doing string interpolation directly in your component. Previously we have seen an example of such function that generates a query for a specific question:

function getQuestionQuery(questionUID) {
  return `
    question(func: uid(${questionUID})) {
      ...
    }
  `;
}

Many occasions will arise in which you need to generate a query dynamically. Most common cases are fetching a single resource by its unique identifier, or posting information while performing mutations. In such occasions, simply export functions like the example above from the appropriate query file that you have established following the previous best practice. Doing so allows you to keep all the queries in a single file in a clean way.

Use Fragments for Repeated Structures

If queries for one of your components exhibits repeated structures, consider extracting them as fragments that can be reused. Doing so will allow us to avoid the mismatch between the shape of Dgraph response and your React component’s expectation of it, a common pitfall that we already identified above.

GraphOverflow homepage

As we can see from the picture, the Home component of Graphoverflow can fetch questions according to three different criteria: ‘Recommended’, ‘Most Recent’, and ‘Hot’. While these criteria fetch questions of different ilks, the returned fields for the questions must be consistent because a single React component is responsible for rendering a question item, no matter what criteria is used to fetch them.

import React from "react";

const QuestionItem = ({ question, history }) => {
  return (
    <li>
      ...
      {question.AnswerCount}
      ...
      {question.ViewCount}
      ...
      <Link to={questionLink} className="question-title">
        {JSON.stringify(question.Title[0].Text)}
      </Link>
    </li>
  );
}

There are three different queries in queries/Home.js, all fetching data required for rendering QuestionItem while following different criteria.

export const recommendedQuestionQuery = `{
  ...
  questions(...) {
    _uid_
    Title {
      Text
    }
    ...
  }
}`;
export const hotQuestionQuery = `{
  ...
  questions(...) {
    _uid_
    Title {
      Text
    }
    ...
  }
}`;
export const recentQuestionQuery = `{
  ...
  questions(...) {
    _uid_
    Title {
      Text
    }
    ...
  }
}`;

If we happen to under-fetch a field for questions in one of these queries, the data for our React component will be incomplete, causing run-time errors. Therefore, it is much more sensible to abstract out the common parts as a ‘fragment.’ Doing so can eliminate the chance of errors in the future iterations.

const questionFragment = `
_uid_
Title {
  Text
}
...
`;
export const recommendedQuestionQuery = `{
  ...
  questions(...) {
    ${questionFragment}
  }
}`;
export const hotQuestionQuery = `{
  ...
  questions(...) {
    ${questionFragment}
  }
}`;
export const recentQuestionQuery = `{
  ...
  questions(...) {
    ${questionFragment}
  }
}`;

Conclusion

Building an application with Dgraph left me with a strange aftertaste. Usually, a new technology inevitably leaves a feeling of ennui, after the excitement of trying out something new eventually subsides. Yet this time, I felt something quite different. Perhaps it was a taste of delight, a hint at something exciting in the making.

As a young developer, I often find myself in a vain pursuit of the newest and the shiniest piece of technology. In that hedonistic treadmill, I time and again stumbled upon many burgeoning technologies proclaiming to be the fastest, the newest, the most game-changing. Perhaps all those claims are justified in their own unique ways, but for the most part, it feels that they are merely clamoring for attention. In contrast, Dgraph seems to make a persuasive case with its well-designed query language alone.

Dgraph has proven to work seamlessly with React, and its rich and intuitive query language allowed me to make and ship a Stack Overflow clone in a matter of weeks without a previous experience. The ease of iteration, coupled with the performance, indicates that Dgraph is a valuable addition to our tool belts for building modern web applications.