HIGHLIGHTS

  • JanusGraph is not a native graph database.
  • JanusGraph is not self contained and relies on third-party solutions such as different (mostly NoSQL) storage backends.
  • If JanusGraph is used with Cassandra and HBase, then it is a distributed database but It won’t have ACID transactions.
  • If JanusGraph is used with BerkleyDB, then it has ACID transactions but it won’t be distributed.

Dgraph

JanusGraph

Native GraphQL Support

Yes

Only DB to natively support GraphQL resulting in capacity to process GraphQL queries in parallel with high performance

No

JanusGraph query language is Gremlin. Reference

Distributed Graph Database

Distributed with the ability to use the same query everywhere as if querying a single database. JanusGraph is distributed only with Apache Cassandra and Apache HBase. Note that BerkeleyDB JE is a non-distributed database. HBase gives preference to consistency at the expense of yield, and Cassandra gives preference to availability at the expense of harvest. Reference

Distributed ACID Transactions

  • Supported and Jepsen-tested
  • Synchronous replication with immediate consistency meaning any client can read the latest write
  • Open source
  • Reference
JanusGraph transactions are not necessarily ACID. They can be so configured on BerkeleyDB, but they are not generally so on Cassandra or HBase, where the underlying storage system does not provide serializable isolation or multi-row atomic writes and the cost of simulating those properties would be substantial. Reference

Sharding

  • Predicate-based sharding. Avoids N+1 problem and network broadcasts when running a query in high fanout scenarios. This ensures low-latency query execution, irrespective of the size of the cluster or the number of intermediate results. Reference
  • Consistent production level latencies and consistent queries. Reference
  • Automatic sharding
  • Sharding a single predicate on the roadmap
  • When JanusGraph is deployed on a cluster of multiple storage backend instances, the graph is partitioned across those machines. By default, JanusGraph uses a random partitioning strategy that randomly assigns vertices to machines.
  • When the graph is small or accommodated by a few storage instances, it is best to use random partitioning for its simplicity. As a rule of thumb, one should strongly consider enabling explicit graph partitioning and configure a suitable partitioning heuristic when the graph grows into the 10s of billions of edges.
  • Reference

Consistent Replication

Synchronous replication across all replicas
  • Only Hbase has native support for strong consistency at row level ( Reference ). Even so JanusGraph documentation explains use of locks for data consistency on HBase here .
  • Cansandra has specific configurations for replication. In general, higher levels are more consistent and robust but have higher latency. Reference

Linearizable Reads

Strong (sequential) consistency across clients. Reference
  • Apache Cassandra or Apache HBase are both eventual consistency storage backends that means JanusGraph must obtain locks in order to ensure consistency. Because of the additional steps required to acquire a lock when committing a modifying transaction, locking is a fairly expensive way to ensure consistency and can lead to deadlock when very many concurrent transactions try to modify the same elements in the graph. Reference
  • JanusGraph first persists all graph mutations to the storage backend. If the primary persistence into the storage backend succeeds but secondary persistence into the indexing backends or the logging system fail, the transaction is still considered to be successful because the storage backend is the authoritative source of the graph. This can create inconsistencies with the indexes and logs. To automatically repair such inconsistencies, JanusGraph can maintain a transaction write-ahead log which is enabled through the configuration. Reference

Correctness and Durability Testing

It is not Jepsen-tested

High Availability

Yes

  • HA Cluster Setup is explained here
  • HA Cluster setup is available in Community Edition
  • High availability depends on the backend configuration. Both Hbase and Cassandra can be highly available.
  • If an instance fails, i.e. is not properly shut down, JanusGraph considers it to be active and expects its participation in cluster-wide operations which subsequently fail because this instances did not participate in or did not acknowledge the operation. In this case, the user must manually remove the failed instance record from the cluster and then retry the operation. Reference

Transparent Data Encryption

Yes

Database files are encrypted at rest with a user-specified key

This depends on the backend storage system. Hbase and Oracle Berkley DB have encryption at rest options. Although it is not documented how they can be used with JanusGraph. Reference for Hbase , and for Berkley DB.

Query Languages

  • GraphQL
  • GraphQL± (Variation of GraphQL supporting advanced features)
Gremlin Query Language

Management of Runaway Queries

  • Context cancellation which works across clients and servers. So, a context cancellation at the client level would automatically cancel the query at all involved servers.
  • OpenCensus integration, which allows distributed tracing all the way from app to Dgraph cluster and back.
  • Open standards for query context cancellation and tracking
  • There is nothing in Gremlin Server that will list running queries. As for cancellation, according to standard TinkerPop semantics a Traversal should respect a request for interruption on a thread. These semantics are enforced by the TinkerPop process test suite. That said, it is still up to the graph provider to properly allow for that behavior. Reference

Backups

  • Binary format
  • Both full and incremental backups to files, S3 and Google storage via Minio.
  • Live backups with no downtime
  • Reference
  • JanusGraph acts as an abstraction layer on top of the storage backends and defers to the storage backends for administrative best practices. As a result, there is a lack of centralized documentation on backend administrative tasks. Reference
  • Cassandra offers Snapshot, incremental, and commit-log backups. Reference
  • Hbase backup offerings are summarized here

Pricing and Free Trial

  • Open source version is under Apache 2.0, so free to use and modify.
  • Enterprise version pricing is based on the number of instances of Dgraph, not the number of cores / RAM / Disk, etc…
Open Source under the Apache 2 license

Appropriate as primary database to build apps/data platform on

Dgraph is a general-purpose database The use case is determined by the storage backend, JanusGraph is a graph engine not a graph database.

Open Source

Yes

  • Apache 2.0. GitHub
  • Enterprise features are NOT Apache 2.0. But, users can still read the source.
  • Dgraph open source version and enterprise version provide the same performance. They’re only different in that enterprise version has more features.
  • Dgraph supports many open standards, like Grpc, Protocol Buffers, Go contexts, Open Census integration for distributed tracing.
Open Source under the Apache 2 license

Protocols

  • HTTP/HTTPS
  • gRPC
  • Protocol Buffers

Point in Time Recovery

On the roadmap JanusGraph does not provide point in time recovery. It can be configured to keep a write-ahead log Reference.

Multi-region Deployments

Yes

Depends on the storage system used. Cassandra has multi-region deployment. Reference

SQL Migration Tool

Yes

No

There are some suggestions on the resources to do this with Cassandra here.

Authentication and authorization

  • JSON web tokens
  • ACLs for enterprise
  • Integration with LDAP on the roadmap
HTTP Basic authentication and authentication over websocket. Reference

Drivers

  • Dgraph’s drivers use gRPC not REST
  • Any GraphQL compatible client can be used
  • Dgraph’s supported drivers are the same as Neo4J’s supported drivers: Java, JavaScript , Go, Python, .Net
  • Dgraph’s unofficial drivers are: Rust, Dart, Elixir
  • Reference
  • A list of TinkerPop drivers is available on TinkerPop’s homepage
  • In addition to drivers, there exist query languages for TinkerPop that make it easier to use Gremlin in different programming languages like Java, Python, or C#.

Multi-database Features

Multi-Tenancy on the roadmap Edge Label Multiplicity

Graph Database As A Service (DBaaS)

Hosted solution launching in mid-year 2020

No

Query Execution Plans

Query planning on the roadmap With JanusGraphManager, you can define a property in your configuration that defines how to access a graph.

Support for graph algorithms

  • Shortest k-paths
  • Edge traversal limit to determine cycles in graphs
  • Others requested from community listed here
JanusGraph doesn’t talk about graph algorithms but one could follow this Gremlin recipe for shortest-path for example.

Apache Spark integration

No

Users can leverage Apache Hadoop and Apache Spark to configure JanusGraph for distributed graph processing. Reference

Kafka integration

On the roadmap There are no official plugins but there are some integrations done by the community. Here is an example with Hbase

Import/export

  • Using BulkLoader or LiveLoader, Dgraph can read the data as is with no modification needed
  • Supported data formats are JSON and RDF
  • Exporting database is explained here

Questions?

Contact US