Knowledge Graphs
Solution Overview
A knowledge graph is a powerful tool for organizing and making sense of large amounts of data. It represents knowledge as a network of entities and their relationships, providing a rich and flexible way of modeling complex domains. However, building a knowledge graph is not without its challenges.
Dgraph is a graph database that is uniquely suited to addressing the challenges of complexity, scale, and performance. Its flexible schema allows for easy data integration, and its distributed architecture enables scalability and high availability. Dgraph also supports efficient graph queries through its ability to generate GraphQL query and mutation APIs directly from GraphQL Schemas (SDL). Additionally, when the limits of the GraphQL standard present themselves, Dgraph's DQL allows users to perform powerful graph queries that can be exposed in the GraphQL API.
In this example, we will walk through how to use Dgraph for building a Taxonomic Knowledge Graph (TKG), a particular class of KGs in which edges describe classes of things. This document describes key aspects of using Dgraph to build a TKG of IT skills and technology platforms known as "struct-ure/kg". The source for this project can be reviewed here.
Introduction to Knowledge Graphs
Knowledge Graphs (KGs) have become a prevalent mechanism for structuring knowledge on the Internet and within organizations. KGs represent knowledge using directed labeled graphs. They can be visualized simply by this well-understood representation:
A Subject in this context represents a node, which is usually an object, such as a person, computer, building, etc. The Object can be thought of as a characteristic or attribute of the Subject, such as the name “Stephen” in the case of a Person Subject. The Predicate denotes the type of relationship that exists between the Subject and Object; for instance, in the previous example, the Predicate is the “name” relationship.
An alternative term for Subject and Object is node. An alternative term for Predicate is edge.
A particular class of KGs in which edges describe classes of things is sometimes called a taxonomy. The remainder of this document describes key aspects of using Dgraph to build a Taxonomic Knowledge Graph (TKG) of IT skills and technology platforms known as “struct-ure/kg”. The source for this can be reviewed here.
Schema
Dgraph supports schema definition using both a native type system (known as DQL) and the standard GraphQL Schema Definition Language (SDL). This document describes using the SDL mechanism. One benefit of using Dgraph’s SDL is that it automatically generates a complete GraphQL API directly from the SDL.
The principal Subject of our TKG is known as a Structure. It represents both tangible concepts (such as the C programming language) as well as classes of things (such as programming languages). Note parts of this definition have been elided for readability, the full schema can be viewed here.
Note that a Structure type has the predicate children, which is an array of other Structures. This is the primary mechanism in which the taxonomic tree is constructed. The parent predicate allows one to freely navigate the entire tree structure.
The MultilingualText type allows multi-language support for labels, names, and descriptions.
An obvious application for a TKG is a user interface control in which the entire tree can be navigated. There’s an interactive version of this here.
Populating the Graph
Data can be added to a TKG via fully automated, semi-automated, or human input mechanisms or a combination of any of these. The TKG that is the subject of this document is populated in a fully automated manner—the definitions and relationships are stored logically in a file system. An importer tool reads the structure and content of this file system to populate the graph.
Dgraph supports JSON and RDF-formatted import files.
Example Queries
Find an IT concept by a term
This query searches for the term “C” in all English entries of the KG.
Query:
Result:
Infer knowledge from a node’s location
This query illustrates the ability to deduce facts from the TKG. For example, this query searches for the term S3. Let’s say for example we parsed that term from a résumé. The results of this query allow us to infer that an individual that has experience with S3 also has experience with Amazon Web Service, and also the general concept of cloud computing.
AI systems rely on taxonomies to infer knowledge in this manner. This is a good example of how a Dgraph TKG could be constructed to serve that purpose.
Query:
Result:
Extend GraphQL using custom graph queries
The GraphQL standard can hinder the ability to perform native graph queries. Dgraph supports the ability to bind native graph queries to the generated GraphQL API.
In this example, we use Dgraph’s DQL to combine multiple nested filters to search for a term in labels, names, and aliases by language. The API endpoint, queryStructureByTermByLang will be present in the generated GraphQL API.
Going further
Have a look at those resources, they also show how to create a GraphQL schema in Dgraph and use custom DQL to extend the generated GraphQL API.
https://dgraph.io/docs/graphql/schema/