Dgraph started around end-August, picked up steam in mid-October, and v0.1 was released in early-December, 2015. From one, the contributors grew to 46, with the project amassing over 4000 Github stars over the past two years. 2190 commits (we squash our branches), 277 branches and 25 releases later, we’re proud to announce that Dgraph has reached v1.0, our first production-ready release.
To give a bit of a background – Dgraph, as a graph database, is designed to excel at the weaknesses of traditional relational databases: traversing relationships and efficiently executing complex joins at scale. Relational databases are not going anywhere, and they would continue to play a huge role in tech. But, at the same time, there’s a big need for handling highly interconnected, sparse data with flexible schema, to truly understand user queries. Google is going this way, and so is the rest of the world. This is what inspired us to build Dgraph back in 2015, and this basic premise continues to shape the project.
Our investors immediately saw the value of what we were building and where the database market is headed. While it is hard as an outsider to raise funds from the valley, we got backing from some of the smartest investors. We’re thrilled to announce that we have raised $3M in seed funding, led by Salil Deshpande of Bain Capital Ventures, joined by Mike Cannon-Brookes of Atlassian, Blackbird Ventures, and AirTree Ventures.
With experience in early-stage infrastructure software and open source, Salil will make an invaluable asset to Dgraph Labs by joining the board. He’s been on Midas list of top 100 VCs, with prominent investments in Mulesoft, Lending Club and, Redis Labs. This funding round comes at the right time for us to expand our offerings to the cloud, and continue to build the most advanced graph database in the world.
For this next phase of Dgraph, we’re moving our engineering team to the SF bay area, which is where the inspiration of Dgraph came from, and where we’ll have access to an unparalleled level of knowledge, talent, and market. At the same time, we’re opening an office in Bangalore.
At Dgraph, our mission is to build world’s best graph database. We’re the only graph database which inspires to become the primary database in the tech stack. This idea has been the driving factor behind the design choices of Dgraph, which we’ll explore now.
Dgraph is built to scale horizontally. Modern data sets have long grown beyond the capabilities of a single server. However, most SQL and graph databases are limited by design to store the universal data set into a single machine. This has many disadvantages, including introducing a single point of failure and being disproportionately more expensive to run.
Building a distributed graph database is a hard challenge. Graph DBs must provide a superset of relational database operations, including storing the data in a way which can execute constant time edge traversals and arbitrarily deep joins, without a view of the universal data set. Most graph databases are, therefore, either a single server architecture or a graph layer built on top of other databases, where they can get this vantage point of universal view. This leads to either lack of scalability, or lack of performance, choose at least one.
Dgraph, on the other hand, was designed keeping the problems of a fast-growing company in mind. It not only shards the data in a way where it can execute queries (joins, traversals, filters, sorts, paginations) without this universal view, it also minimizes the number of network calls and disk seeks required to execute these queries. Both of these factors become prominent in datasets spanning hundreds of servers. Not only that, as you add more servers, data automatically gets sharded and moved to fill the new ones. Dgraph automatically rebalances shards among servers to ensure that load is evenly distributed.
While being distributed, Dgraph provides lock-free ACID transactions with snapshot isolation, designed for low latency. Transactions make it a lot easier for application developers to think through database behavior, removing any data integrity issues.
Dgraph does away with the whole master-slave replication architecture. Early on, we decided against providing an eventual consistency model. In that model, a replica which is running behind the master would not show latest writes, and the developer needs to handle this in the application logic. We want Dgraph system to be reliable and consistent, where scalability is an operational feature, not a hurdle application developers need to code around.
Instead, Dgraph provides consistent (synchronous) replication, utilizing Raft, a consensus algorithm, allowing k servers dying in a 2k+1 replication setting, without affecting any reads or writes. Each data shard can be replicated an odd number of times, and replicas located in different datacenters, or availability zones to allow faster communication from your clients. Dgraph guarantees atomic consistency of writes, which means irrespective of which replica is hit for reading, any write done before is guaranteed to be available. This guarantee makes it a lot easier to build applications, which can then serve queries from the closest datacenter to the end-user.
Despite the focus on distribution, Dgraph performs exceptionally well as a single server. In fact, many Dgraph users run it on a single server and benefit from the amazing performance it brings. You can read more about the capabilities of Dgraph, in this recent post.
The next chapter of Dgraph is very exciting. We’re planning to provide a managed hosted service, to make it easier for adopters to use Dgraph without any of the operational overhead. We’ll also be significantly improving our user interface, to allow cluster management and advanced data exploration.
They say the true test of any distributed system is Jepsen, and in the coming year, we’re going to be working with Kyle Kingsbury to help set up the tests for Dgraph. Simultaneously, we’ll push our design to span globally, with automatic replication across geographically distributed regions, to become the Spanner of Graph Databases.
We’ll be expanding our list of languages to include Open Cypher and Gremlin. Both of these are very popular graph languages, and would make it easier for existing graph users to switch to Dgraph. We also plan to support the standard version of GraphQL.
We’re really excited about v1.0 release of Dgraph. Pushing it this far has been a particularly challenging road, full of experimentation. Dgraph is the only graph database which is building for native geographical distribution of data, while providing efficient arbitrarily deep joins and traversals. Those are hard problems that we have solved.
Lacking a guiding Google paper on distributed graph databases, we built a system which puts together the best of Bigtable (NoSQL) and Spanner (NewSQL). Within Dgraph, we term this NewGraph – a graph database which is horizontally scalable, multi-versional, transactional, synchronously replicated and inherently capable as a primary database. In fact, we think a paper describing the unique design of Dgraph is quite important, so we’re working towards writing one. Being engineers, we can’t think of a better way to celebrate the last over two years of hard work.
With Dgraph, we hope to break the cycle of many fast-growing companies from looking for a graph database, trying something, giving up and building their own graph system; to looking for a graph database, finding Dgraph and building applications on top.
While traditional databases only serve data, graph databases allow machine interpretation of data, which is crucial to build smart systems. We look forward to putting Dgraph into the hands of users who need the stability of v1.0, and hear about all smart use cases they are building. And we’re confident that Dgraph would make their use cases successful. We have taken giant strides in graph database world, and we promise to continue taking those, so our users can push the envelope for what’s possible and maintain the competitive edge.
Also read coverage on TechCrunch.