GraphQL atop a Graph Database

GraphQL atop a Graph Database

When I first learned about GraphQL I was confused - at least by the name. The name “GraphQL” suggested a query language to access graphs, but everyone was using it to dynamically query relational and other data. It looked more like a flexible version of REST to me. I have since realized that GraphQL is a technique for turning stored graph data into hierarchical JSON responses, so GraphQL is indeed closely related to graphs.

GraphQL allows callers to specify JSON responses dynamically

For quick background, a GraphQL query is a template for a JSON response, and is typically executed via HTTP. Unlike REST, the requests and payloads are not fixed, so a GraphQL schema is a menu of all data that can be queried, and the clients determine what, exactly, is pulled back.

This is simpler and more powerful than defining a new REST format or OpenAPI spec and possibly waiting for another team to write desired services for each new API. Callers (app developers) query what they need, when and how they need it.

For instance:

Hero {
    name
    appearsIn
    OriginStory (filter: { pubDate: {gt: "1990-01-01" }}) {
       planet
       pubDate
    }
}

Might return

{
    "data": {
        "Hero": {
            "name": "R2-D2",
            "appearsIn": [
                "EMPIRE",
                "JEDI"
            ],
            "OriginStory": [{
                "planet": "Naboo",
                "pubDate": "2003-05-19"
        }]
    }
}

This is super-flexible, allowing all callers to get exactly what they want without writing new APIs. These straightforward templates resemble JSON and retrieve only the data desired. Despite their simplicity they specify the nesting, filtering and data joins required. All this without any SQL statements, join clauses, or a new OpenAPI schema, nested data comes back: we see the main Hero entity with related OriginStory entities joined as a sub-list.

But what does this have to do with graphs?

GraphQL is fundamentally a syntax for turning a graph into a tree.

It turns out that all databases can be thought of as a graph. The PK/FK relations in a relational database form all data into a (logical) graph. JSON nesting relationships are clearly a tree-structure, and a tree is also a graph. Graph databases are the only technology that physically store and optimize data as a graph, but logically they are all graphs.

So GraphQL is great, but using it with most databases leads to an impedance mismatch, since modern applications call for tree-structured JSON data. GraphQL bridges this gap by converting arbitrary stored data into JSON return values. Merely nesting two entities in a GraphQL query instructs an underlying system to navigate (across graph-like links) from the first entity to the second, and then on to each entities various fields - internally most GraphQL systems navigate each of these links using a “resolver” - a bit of code or config that tells the system how to get from one entity to the other, or to a field: from hero to originStory, and oringStory to planet, in the example above.

Resolvers, over-fetching and under-fetching

But at what cost? There is no free lunch if you use a non-graph database to represent a logical graph. Non-graph data stores necessitate writing “resolvers” to tell a GraphQL engine how to query, join or navigate data structures (tables, JSON documents, REST services) in the underlying data stores to overcome an impedance mismatch.

In a graph database, no resovers are needed. Every navigation in the GraphQL query simply traverses a relationship (edge) in the graph database.

Considering the above Hero example on a relational database, the resolvers would be configured using SQL snippets such as a WHERE clauses that specify the join logic across a relationship table. Something pretty ugly like:

FROM HERO h JOIN CHAR_STORY_RELS c ON h.HERO_ID = c.HID JOIN ORIGIN o ON c.OID = o.ORIGIN_ID WHERE c.TYPE = 'originstory'

In a JSON database, it may be done via fragments of JSON-path to navigate into sub-properties together with even more complex logic to find links among disparate JSON documents. Consider MongoDB code to do a similar simple lookup.

db.HERO.aggregate([ { $lookup: { from: "ORIGIN", localField: "HERO_ID", foreignField: "ORIGIN_ID", as: "origin" } }, { $unwind: "$origin" }, { $project: { _id: 0, HERO_ID: 1, NAME: 1, ORIGIN_ID: "$origin.ORIGIN_ID", PLANET: "$origin.PLANET" } } ])

All these approaches are a burden and most lead to inefficient over- and under-fetching. Typically, NoSQL databases over-fetch (retrieving the entire originStory document to access one field), and RDBMS tend to require complex joins or N+1 queries to get a parent entity plus all related sub-entities.

Graph databases to the rescue

Dgraph does not require code to define the lookups of fields and relationships, nor does Dgraph require a separate data model that must then be mapped to and from the GraphQL schema. The GraphQL model is the data model, and Walking the GraphQL relationships translates, with no configuration at all, to navigating relationships in the graph. No resolvers. No N+1 query problem. No join operations. Period.

And no slow join operations. I’ve mostly been talking about the pgorgramming burden so far, but executing a GraphQL query without invoking piles of resolver code is also faster. In Dgraph in particular (vs other graph databases) GraphQL queries are native, so are directly planned and optimized as graph queries.

Seasoned programmers know that whatever takes a long time to write for version 1 takes even longer to maintain over the next five years. With GraphQL, the resolvers become a maintenance headache, and slow down the pace of change. The point to GraphQL was always to allow faster, more agile data access. Removing the non-graph DB impedance mismatch therefore speeds future development and maintenance as well. GraphQL schema changes in Dgraph result in a new query-able structure in seconds.

Summing it up

It is not immediately obvious that GraphQL is querying graph data because so many people use non-graph data stores behind GraphQL, and the return values are JSON trees. So with a relational back-end and a tree-strucutred result, where’s the graph? I hope this blog post illustrates that GraphQL is always querying a logical graph, and the graph structure is hiding in the piles of resolver code that most implementations require.

Any graph database will at least partially remove this impedance mismatch, and Dgraph in particular natively accepts GraphQL schemas as the database schema so that any GraphQL schema is update-able and query-able seconds after it is modified in Dgraph. Also, Dgraph optimizes and directly executes GraphQL queries without installing some kind of resolver or translation layer such as express.

No resolvers. One data model for both database and query. No REST service specifications. Full flexibility. Blazing performance.

It’s a great combination that allows agile, efficient data access for all data consumers.