When I started building the initial version of the Dgraph Go client, we were looking for a serialization format which was fast, easy to use and supported multiple language runtimes. We finally implemented our client using Protocol Buffers which gave twice the speed and consumed two-third memory compared to JSON according to our benchmarks.
Dgraph v0.2 already supported serialization to JSON for the HTTP client. For our language specific drivers, we wanted something that would give us some performance improvement over JSON. Though we use Flatbuffers for everything internally, they lacked support for encoding recursive data structures. Protocol buffers seemed the right choice because they worked with most of the modern languages and could encode recursive data structures efficiently.
To use protocol buffers, you define the message (data structures that form the basis of communication) in a .proto
file and then compile it using the protocol buffer compiler. For communication, we use gRPC which is an open-source RPC framework by Google. gRPC requires services to be defined in the same .proto
file. Using gRPC allows us to communicate in binary format which is faster than retrieving JSON formatted results.
// The Node object which can have other children node and properties.
message Node {
uint64 uid = 1;
string xid = 2;
string attribute = 3;
repeated Property properties = 4;
repeated Node children = 5; // Each node can have multiple children
}
message Request {
string query = 1;
// and other fields
}
message Response {
Node n = 1;
// and other fields
}
// Dgraph
service used for communication between the Dgraph
server and client over gRPC.
service Dgraph {
rpc Query (Request) returns (Response) {}
}
You can find the full .proto
file here. The .proto
file can be used to generate the corresponding Go code using the protoc
compiler and the runtime library.
In Go, you can easily measure how your algorithm does (in terms of time and space) by writing benchmarks. Go benchmarks are unique in that they’d iterate over the test code b.N
number of times, where b.N
is adjusted until the benchmark function lasts long enough to be timed reliably.
To test how our implementation was doing against our JSON implementation we wrote benchmarks for it. But first, let’s understand what a benchmark is and how can we interpret its results.
Let’s write a simple function, which just adds integers to a list.
func addToList() {
list := make([]int, 10)
for i := 0; i < 1000; i++ {
list = append(list, i)
}
}
Here’s benchmarking code:
func BenchmarkAddToList(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
addToList()
}
}
We can run the above benchmark using go test -bench=.
Here, Go benchmark would repeatedly call the function with different values for b.N
until it can be timed reliably.
$ go test -bench=.
BenchmarkAddToList-4 200000 8153 ns/op 22624 B/op 7 allocs/op
Here’s what the output means:
If we change the line which initializes the slice to
list := make([]int, 0, 1000)
and run the benchmarks again, we get better results:
BenchmarkAddToList-4 1000000 1618 ns/op 0 B/op 0 allocs/op
The allocs/op reduced because we had already initialized the list with the appropriate size and the runtime doesn’t have to reallocate it when we append elements. Also the B/op reduced because the list is not initialized with 0 for all its elements.
After implementing serialization using the protocol buffers, to get exact metrics we wrote benchmark tests for our ToJson and ToProtocolBuffer methods. These methods convert the internal SubGraph data structure to a byte array which is transferred over the network. Benchmark tests are an excellent way to compare different implementations or to measure if new code leads to any improvements.
// Benchmark test for ToProtocolBuffer method.
func benchmarkToPB(file string, b *testing.B) {
b.ReportAllocs()
var sg SubGraph
var l Latency
// Reading the SubGraph data structure from a file.
f, err := ioutil.ReadFile(file)
if err != nil {
b.Error(err)
}
buf := bytes.NewBuffer(f)
dec := gob.NewDecoder(buf)
err = dec.Decode(&sg)
if err != nil {
b.Error(err)
}
b.ResetTimer()
// Running the benchmark tests.
for i := 0; i < b.N; i++ {
pb, err := sg.ToProtocolBuffer(&l)
if err != nil {
b.Fatal(err)
}
r := new(graph.Response)
r.N = pb
var c Codec
if _, err = c.Marshal(r); err != nil {
b.Fatal(err)
}
}
}
// Benchmark test for ToJSON
func benchmarkToJson(file string, b *testing.B) {
b.ReportAllocs()
var sg SubGraph
var l Latency
f, err := ioutil.ReadFile(file)
if err != nil {
b.Error(err)
}
buf := bytes.NewBuffer(f)
dec := gob.NewDecoder(buf)
err = dec.Decode(&sg)
if err != nil {
b.Error(err)
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
if _, err := sg.ToJSON(&l); err != nil {
b.Fatal(err)
}
}
}
You can find the complete benchmark tests here. There are some differences in the algorithm that converts the internal Subgraph structure to JSON/Protocol Buffers. You can have a look at the code responsible for this here.
Using these benchmark tests we were able to improve our metrics by over 50% by switching over to []byte
from {}interface
for ObjectValue
as part of this change. Later when we shifted to Gogo Protobuf, we compared these benchmarks again with the previous ones to confirm improvement.
This is how the final benchmark results compare for a query which returns 1000 entities in the result.
BenchmarkToJSON_1000_Director-2 500 2512808 ns/op 560427 B/op 9682 allocs/op
BenchmarkToPB_1000_Director-2 2000 1338410 ns/op 196743 B/op 3052 allocs/op
The benchmarks show that ToPB method is almost 2x faster than ToJSON as it takes much lesser nanoseconds per operation. The bytes allocated per operation show that ToPB allocates 65% less memory compared to ToJSON. You could find more information about those benchmarks and what we changed to get here in our README.
BenchmarkToJSONUnmarshal_1000_Director-4 1000 1279297 ns/op 403746 B/op 5144 allocs/op
BenchmarkToPBUnmarshal_1000_Director-4 3000 489585 ns/op 202256 B/op 5522 allocs/op
We can see that unmarshalling on the client would also be 2.6x faster for protocol buffers compared to JSON. ToPB allocates 50% less memory compared to ToJSON.
If both your server and client are written in Go, then we recommend Gogo Protobuf instead of Golang Protobuf as the runtime library. Gogo has 2.3x faster marshaling while allocating 80% fewer bytes per operation and 1.5x faster unmarshalling compared to Golang protobuf as shown in the benchmarks below.
BenchmarkToPBMarshal_1000_Director-4 3000 360545 ns/op 226504 B/op 22 allocs/op # Golang protobuf
BenchmarkToPBMarshal_1000_Director-4 10000 156820 ns/op 49152 B/op 1 allocs/op # Gogo protobuf
BenchmarkToPBUnmarshal_1000_Director-4 2000 733481 ns/op 200241 B/op 5523 allocs/op # Golang protobuf
BenchmarkToPBUnmarshal_1000_Director-4 3000 487745 ns/op 202256 B/op 5522 allocs/op # Gogo protobuf
Note that Gogo protobuf has support only for Go as of now. However, if you are using some other language, this isn’t a problem. Gogo protobuf is backward compatible with Golang protobuf. Our Python and Java clients can still interact with the server (which does marshaling using Gogo), hence making Gogo a safe choice.
We would love to hear about your interaction with the Dgraph server.