How Capventis is Using Dgraph to Streamline Messy Legacy Data
Dgraph is a no-brainer. I can ingest any data and any structure, and I don’t have to worry about it. If I were trying to do this in a SQL table, we would end up with horrible joins and get tied up in knots. Dgraph is a godsend.”
Summary
Executive Summary
Capventis pulls vast quantities of legacy and real-time data from diverse sources for its top-tier clients. The team needed a solution that could streamline and scale, offering improved insights and performance to deliver their clients the best results. After testing multiple graph databases, Capventis found Dgraph is the easiest to use with the best performance, making the Capventis team more efficient, accurate, and flexible.
Problem
As a specialist business and technology consulting company based in the UK, Capventis provides solutions for customer engagement, experience management, analytics, and data science. The company’s technology facilitates and simplifies complex integrations from multiple data sources and formats, providing holistic views of customer experiences and engagement.
Customer experience design, support, and sentiment analysis require a range of tools and data. Examples include:
- Broadcasters might need analytics on how users are consuming media. By discerning consumers’ media preferences, the broadcaster can provide real-time recommendations. This data can also be used to set email cadence and offers.
- Online retailers may want to track multichannel interactions across email, websites, visits to dealers, feedback from a test drive, and post-purchase sentiment.
With notable customers in all major market segments, Capventis integrates data that might come from legacy datasets with hundreds of tables, as well as modern API-driven Saas applications.
This variety of data and integrations required a specific data stack, which Capventis named Glü. Based on the scope and scale of their projects, the team wanted a GraphQL-based interface with a flexible, reliable, high-performance graph database on the back end.
Database design requirements included:
- Flexible data source integration and ETL
- Rapid scalability from test applications to global scale production environments
- Platform agnostic – works on Windows, Linux, macOS, iOS Android
- Small footprint and efficient data usage
- Reliable and enterprise-ready
Approach
Capventis’ team used the Go programming language for most integration projects. The company was already bumping against the limitations of scale-out, rapid and flexible integrations using legacy SQL databases with tabular structures that required large numbers of joins to integrate disparate data sources. Moving from a tabular structure to graph-based databases would give Capventis more flexibility and agility to design projects and mix-and-match data sources and types.
When Dgraph was released into the Go community, Capventis decided to test Dgraph as a solution option, along with two other graph databases: neo4j and Cayley. One of the other databases was complicated to deploy and configure, causing headaches in development that also limited scalability. The other database proved to be extremely inefficient; moving a million records for GDPR compliance required gigabytes of space and exhausted the available memory.
Dgraph testing was far superior to the alternatives. It was configured and ran in less than an hour and quickly scaled from one compute node to dozens with minimal configuration changes. Dgraph also provided global sharding, enhancing scalability. Testing Dgraph, Capventis found it had a tiny data footprint that efficiently handled the million record test and made it suitable for embedding.
The Capventis team preferred the added features of Dgraph’s DQL query language.
Outcome
A test of Dgraph came with a project for a government agency involving integrating several older proprietary databases that were still in flight and being regularly updated with new information.
The database involved multiple departments that had merged and separated and merged again over the years. As a result, what should have been a dozen tables had exploded into 300, and the existing database vendor refused to collaborate on the project. The government agency couldn’t provide Capventis with an accurate data schema. Meanwhile, another department added data but continued making changes to that data even as the aggregation was underway.
The Capventis team used Dgraph to convert all the legacy data from multiple sources and cleansed it on the fly with no data loss – and generated a clear schema ready for immediate queries.
Capventis continues to use Dgraph for a wide range of projects. The team deploys Dgraph in AWS and Microsoft Azure as Docker containers and has Dgraph and Glü development environments that can run on laptops. For Glü, Capventis has added many additional functionalities such as encryption, rate limiting, and data schema visualizations, all leveraging Dgraph. Capventis frequently uses the Dgraph DQL to quickly build interfaces for exporting compound datasets collected from various systems into multiple business intelligence tools simultaneously.
Conclusion
To date, Dgraph has proven highly available, reliable, and fast. Capventis hasn’t encountered any performance issues, despite working with some very messy data integrations and complex graph traversals. In the rare instances when Capventis has had a problem, they found the Dgraph community to be responsive and friendly. That level of support is cited as a significant part of the decision to push Dgraph to all relevant Capventis customers.
For Capventis customers, Dgraph often opens up new horizons in data. By enabling easy linkages between edges and facets and providing flexibility in how queries can be built, Dgraph can pull queries that give users views of data or understandings of relationships between nodes on the graph that was not previously possible to explore.