Releasing distributed transactions in v0.9

It all started with a Github issue.

At Dgraph, we really care about user feedback. Most of what we’ve built starting January 2017, has been based what our community (that’s you!) told us. The biggest contribution that we get from our community, is in the form of feedback. We’ll forgo any code contribution for quality feedback based on real-world usage.

Since the beginning of Dgraph, transactions were road mapped as a post v1.0 feature. Dgraph is a distributed and synchronously replicated system. Adding transactions in such a system is a hard challenge; something that we felt wasn’t worth the complexity to tackle early on.

That changed when Gustavo Niemeyer filed this issue. In this issue, he made a very convincing case for supporting transactions sooner rather than later.

So, coming back to Dgraph, if the idea is indeed to position it as a general alternative for existing databases as I’ve watched in one of your videos, please don’t make the same mistake of postponing transactions for too long, or voicing them as relevant mainly for financial transactions. The sort of consistency at stake is relevant for pretty much any application at all that uses data, even more when even basic details about a record are recorded as multiple individual terms. This would make the situation even worse than with MongoDB. –Gustavo

The arguments made by Gustavo were intriguing enough for us to look seriously in that direction. Once we started looking, evidence was everywhere. People have been complaining about lack of transactions in MongoDB. Bigtable author and Google Fellow, Jeff Dean, considered not implementing transactions in Bigtable his “biggest mistake as an engineer”.

It was clear that transactions were something that we should implement right away.

So that’s what we did. We used what we call the Blitzkrieg approach. Explaining it can be a blog post of its own, but the general idea is that a single or a very small set of engineers work initially to make changes deep into the core, which would break most things minus the core (possibly one package). And then the rest of the team helps fix up the outer shells level by level. We use this technique regularly to implement major design changes at a lightning speed.

The entire work from the reporting of the issue (Sep 13) to implementing transactions in Badger (Oct 5), to releasing v0.9 with transactions (on Nov 14), was done within a time span of two months.

In this blog post, we won’t go into the details of how Dgraph’s transactions work. There’s a lot of interesting bits there, due to the uniqueness of this challenge; so the team decided that a blog post won’t do justice to what has gone into building this amazingly distributed graph database over the past two years. Instead, we plan to write a technical paper about Dgraph’s unique design. Watch for that in the coming months!

So, instead of how it works, this blog post focuses on how to use transactions to build your application.

The transaction model

Transactions come along with a new model for how to interact with Dgraph. Previously, it has just been single queries and data mutations on their own. Now all queries and mutations are performed as part of a transaction.

Dgraph can perform read-modify-write transactions, the typical lifecycle being:

  1. Create a transaction. Go client: client.NewTxn().

  2. Execute a series of queries and mutations. Go client: txn.Mutate(...) and txn.Query(...).

  3. Finally, commit or abort the transaction. Go client: txn.Commit() and txn.Abort().

If two concurrently running transactions write to the same data, then one of the commits will fail. It’s up to the user to retry.

Why are transactions important?

Database transactions are important for any app that needs to update its state based upon its previous state in a consistent manner or has operations that need to apply in atomic units.

That covers a lot of different things. Just to name a few:

  • Marking inventory in an online shop as sold. You wouldn’t want to sell the last remaining item to two customers.

  • Paying out bets on an online poker site. It’s important to ensure the same win isn’t paid twice.

  • Inventory management for a warehouse. Restocking an item twice without seeing the new quantity could result in twice as much held stock as intended.

  • Financial transactions. When transferring money, it’s important that credits and debts on two accounts are either both applied or not applied at all.

Dgraph v0.9 introduces distributed ACID transactions with synchronous replication. What this means is that transactions work across multiple servers each holding a part of the graph, providing ACID guarantees.

Increasing throughput is still just a matter of bringing up additional dgraph instances. There is no need to worry about seeing a previous database state when querying a replica. From the point of view of a single client, once a transaction is committed its changes are guaranteed to be visible in all future transactions. These guarantees help simplify application code significantly while providing a high level of scalability and crash resilience.

Client Libraries

Dgraph exposes its API via gRPC and HTTP. However…

Transactions require some bookkeeping and state management on the client side. Because of this, it’s strongly recommended to use a client library to interact with dgraph.

At the time of writing, official Go, Java and a community-driven Javascript clients are available.

Client libraries for other languages can be implemented on top of the gRPC or HTTP APIs. The best way to approach this is to read the documentation about how to use the raw HTTP API and look at the implementations for other existing clients.

The examples in this blog post will use the Go client.

A simple login system

Prior to v0.9.0 dgraph had an upsert feature which is now removed. Upsert atomically searches and retrieves or creates and retrieves depending on whether an entity exists or not.

With transactions, an explicit upsert feature is no longer required. This is because upsert style operations can be performed atomically within a transaction.

So how is this done?

In this example, we model a simple login system, where a user has to provide an email address and password in order to gain access to the system.

If the user already exists, then the password must match. If the user doesn’t yet exist, then their password should be stored for later logins.

It’s important to do all of this in a transaction. If it’s not, then the same account might inadvertently be created twice.

Error checking and JSON marshalling/unmarshalling have been omitted for brevity:

// Create a new transaction. The deferred call to Discard
// ensures that server-side resources are cleaned up.
txn := client.NewTxn()
defer txn.Discard(ctx)

// Create and execute a query to looks up an email and checks if the password
matches.
q := fmt.Sprintf(`
    {
        login_attempt(func: eq(email, %q)) {
            checkpwd(pass, %q)
        }
    }
`, email, pass)
resp, err := txn.Query(ctx, q)

// Unmarshal the response into a struct. It will be empty if the email couldn't
// be found. Otherwise it will contain a bool to indicate if the password matched.
var login struct {
    Account []struct {
        Pass []struct {
            CheckPwd bool `json:"checkpwd"`
        } `json:"pass"`
    } `json:"login_attempt"`
}
err = json.Unmarshal(resp.GetJson(), &login)

// Now perform the upsert logic.
if len(login.Account) == 0 {
    fmt.Println("Account doesn't exist! Creating new account.")
    mu := &protos.Mutation{
        SetJson: []byte(fmt.Sprintf(`{ "email": %q, "pass": %q }`, email, pass)),
    }
    _, err = txn.Mutate(ctx, mu)
    // Commit the mutation, making it visible outside of the transaction.
    err = txn.Commit(ctx)
} else if login.Account[0].Pass[0].CheckPwd {
    fmt.Println("Login successful!")
} else {
    fmt.Println("Wrong email or password.")
}

Bank Account Transfers

The classical example for database transactions is to transfer money between two bank accounts. In this example, we have a set of bank accounts, each represented by a node in the graph. Each node is known by a uid and has its balance represented by a bal predicate.

This example was extracted from a tool we used when testing the correctness of our transaction implementation. The full source is here.

Given the uid of two accounts, we want to transfer money from one account to the other, i.e. reduce one balance and increase the other by the same amount.

It’s important that this is done in a transaction; if it isn’t, then two transfers happening concurrently could result in the net balance of all accounts changing. It could also result in double spending.

txn := s.dg.NewTxn()
defer txn.Discard(ctx)

// Get current balances for the two accounts.
q := fmt.Sprintf(`{both(func: uid(%s, %s)) { uid, bal }}`, from, to)
resp, err := txn.Query(ctx, q)
type Accounts struct {
    Both []Account `json:"both"`
}
var a Accounts
err := json.Unmarshal(resp.Json, &a)

// Perform the transfer.
a.Both[0].Bal += 5
a.Both[1].Bal -= 5
if a.Both[0].Bal < 0 || a.Both[1].Bal < 0 {
    // Abandon the transaction if there are insufficient funds.
    return
}

// Write back to dgraph.
var mu protos.Mutation
data, err := json.Marshal(a.Both)
mu.SetJson = data
_, err = txn.Mutate(ctx, &mu)
err = txn.Commit(ctx)

Conclusion

It has historically been difficult to implement transactions in NoSQL technologies. Notably, MongoDB has been working on a solution for a while.

So implementing transactions with synchronous replication is a massive milestone for Dgraph. With this complex but valuable feature, our community will be able to build apps on top of dgraph without having to worry about tricky data integrity issues!