Knowledge Graphs

Solution Overview

A knowledge graph is a powerful tool for organizing and making sense of large amounts of data. It represents knowledge as a network of entities and their relationships, providing a rich and flexible way of modeling complex domains. However, building a knowledge graph is not without its challenges.

Dgraph is a graph database that is uniquely suited to addressing the challenges of complexity, scale, and performance. Its flexible schema allows for easy data integration, and its distributed architecture enables scalability and high availability. Dgraph also supports efficient graph queries through its ability to generate GraphQL query and mutation APIs directly from GraphQL Schemas (SDL). Additionally, when the limits of the GraphQL standard present themselves, Dgraph's DQL allows users to perform powerful graph queries that can be exposed in the GraphQL API.

In this example, we will walk through how to use Dgraph for building a Taxonomic Knowledge Graph (TKG), a particular class of KGs in which edges describe classes of things. This document describes key aspects of using Dgraph to build a TKG of IT skills and technology platforms known as "struct-ure/kg". The source for this project can be reviewed here.

Introduction to Knowledge Graphs

Knowledge Graphs (KGs) have become a prevalent mechanism for structuring knowledge on the Internet and within organizations. KGs represent knowledge using directed labeled graphs. They can be visualized simply by this well-understood representation:


A Subject in this context represents a node, which is usually an object, such as a person, computer, building, etc. The Object can be thought of as a characteristic or attribute of the Subject, such as the name “Stephen” in the case of a Person Subject. The Predicate denotes the type of relationship that exists between the Subject and Object; for instance, in the previous example, the Predicate is the “name” relationship.

An alternative term for Subject and Object is node. An alternative term for Predicate is edge.

A particular class of KGs in which edges describe classes of things is sometimes called a taxonomy. The remainder of this document describes key aspects of using Dgraph to build a Taxonomic Knowledge Graph (TKG) of IT skills and technology platforms known as “struct-ure/kg”. The source for this can be reviewed here.

Schema

Dgraph supports schema definition using both a native type system (known as DQL) and the standard GraphQL Schema Definition Language (SDL). This document describes using the SDL mechanism. One benefit of using Dgraph’s SDL is that it automatically generates a complete GraphQL API directly from the SDL.

The principal Subject of our TKG is known as a Structure. It represents both tangible concepts (such as the C programming language) as well as classes of things (such as programming languages). Note parts of this definition have been elided for readability, the full schema can be viewed here.


Structure represents the basic element of the struct-ure knowledge graph.
"""
type Structure {
    id: String! @id @search(by: [regexp])

    "The label of the Structure, for instance in use in UI"
    label: [MultilingualText!]!

    "The name of the Structure"
    name: [MultilingualText!]!

    "The description of the Structure"
    description: [MultilingualText!]

    "Alternate names of the Structure"
    aliases: [MultilingualAlias!] @hasInverse(field:entity)

    "Related Structures"
    related: [Structure!]

    "The Structure's parent in the tree"
    parent: Structure
    
    "Children of this Structure"
    children: [Structure!] @hasInverse(field: parent)
}

"""
MultilingualText represents Structure text values represented in one or more written languages
"""
type MultilingualText {
    "The language identifier"
    lang: String! @search(by: [exact])
    "The actual value"
    value: String! @search(by: [exact, term, fulltext, regexp])
}

Note that a Structure type has the predicate children, which is an array of other Structures. This is the primary mechanism in which the taxonomic tree is constructed. The parent predicate allows one to freely navigate the entire tree structure.

The MultilingualText type allows multi-language support for labels, names, and descriptions.

An obvious application for a TKG is a user interface control in which the entire tree can be navigated. There’s an interactive version of this here.

Populating the Graph

Data can be added to a TKG via fully automated, semi-automated, or human input mechanisms or a combination of any of these. The TKG that is the subject of this document is populated in a fully automated manner—the definitions and relationships are stored logically in a file system. An importer tool reads the structure and content of this file system to populate the graph.

Dgraph supports JSON and RDF-formatted import files.

Example Queries

Find an IT concept by a term

This query searches for the term “C” in all English entries of the KG.

Query:


query {
  queryStructure @cascade {
    id
    name(
      filter: { lang: { eq: "en" }, and: 
          [{ value: { allofterms: "C" } }] }
    ) {
      value
    }
    description(filter: { lang: { eq: "en" } }) {
      value
    }
  }
}

Result:


{
  "queryStructure":[
    {
      "id":"https://struct-ure.org/kg/it/programming-languages/objective-c",
      "name":[
        {
          "value":"Objective-C"
        }
      ],
      "description":[
        {
          "value":"general-purpose, high-level, object-oriented programming language"
        }
      ]
    },
    {
      "id":"https://struct-ure.org/kg/it/programming-languages/c",
      "name":[
        {
          "value":"C"
        }
      ],
      "description":[
        {
          "value":"general-purpose programming language"
        }
      ]
    },
    {
      "id":"https://struct-ure.org/kg/it/programming-languages/c++",
      "name":[
        {
          "value":"C++"
        }
      ],
      "description":[
        {
          "value":"general-purpose programming language"
        }
      ]
    }
  ]
}

Infer knowledge from a node’s location

This query illustrates the ability to deduce facts from the TKG. For example, this query searches for the term S3. Let’s say for example we parsed that term from a résumé. The results of this query allow us to infer that an individual that has experience with S3 also has experience with Amazon Web Service, and also the general concept of cloud computing.

AI systems rely on taxonomies to infer knowledge in this manner. This is a good example of how a Dgraph TKG could be constructed to serve that purpose.

Query:


query {
  queryStructure @cascade {
    id
    name(
      filter: { lang: { eq: "en" }, 
          and: [{ value: { anyofterms: "S3" } }] }
    ) {
      value
    }
    description(filter: { lang: { eq: "en" } }) {
      value
    }
    parent {
      id
      parent {
        id
        parent {
          id
          parent {
            id
          }
        }
      }
    }
  }
}

Result:


{
  "queryStructure":[
    {
      "id":"https://struct-ure.org/kg/it/cloud-computing/amazon-web-services/s3",
      "name":[
        {
          "value":"Amazon S3"
        }
      ],
      "description":[
        {
          "value":"cloud storage service offered by Amazon Web Services"
        }
      ],
      "parent":{
        "id":"https://struct-ure.org/kg/it/cloud-computing/amazon-web-services",
        "parent":{
          "id":"https://struct-ure.org/kg/it/cloud-computing",
          "parent":{
            "id":"https://struct-ure.org/kg/it",
            "parent":{
              "id":"https://struct-ure.org/kg"
            }
          }
        }
      }
    }
  ]
}

Extend GraphQL using custom graph queries

The GraphQL standard can hinder the ability to perform native graph queries. Dgraph supports the ability to bind native graph queries to the generated GraphQL API.

In this example, we use Dgraph’s DQL to combine multiple nested filters to search for a term in labels, names, and aliases by language. The API endpoint, queryStructureByTermByLang will be present in the generated GraphQL API.


  queryStructureByTermByLang(term: String!, lang: String!, first: Int=100, offset: Int=0): 
          [Structure] @custom(dql: """
    query q($term: string, $lang: string, $first: int=100, $offset: int=0) {
        LABELS as var(func: type(Structure)) @cascade {
            Structure.label @filter(anyofterms(MultilingualText.value, $term) AND 
                eq(MultilingualText.lang, $lang)) {
                uid
            }
        }
        NAMES as var(func: type(Structure)) @cascade {
            Structure.name @filter(anyofterms(MultilingualText.value, $term) AND 
                eq(MultilingualText.lang, $lang)) {
                uid
            }
        }
        ALIASES as var(func: type(Structure)) @cascade {
            Structure.aliases @filter(anyofterms(MultilingualAlias.values, $term) AND
                eq(MultilingualText.lang, $lang)) {
                uid
            }
        }
            
        queryStructureByTermByLang(func: uid(LABELS, NAMES, ALIASES), orderasc: Structure.id, 
            first: $first, offset: $offset) {
            id: Structure.id
            label: Structure.label {
                lang: MultilingualText.lang
                value: MultilingualText.value
            }
            name: Structure.name {
                lang: MultilingualText.lang
                value: MultilingualText.value
            }
            description: Structure.description {
                lang: MultilingualText.lang
                value: MultilingualText.value
            }
            aliases: Structure.aliases {
                lang: MultilingualAlias.lang
                values: MultilingualAlias.values
            }
            related: Structure.related
            parent: Structure.parent {
                id: Structure.id
            }
        }
    }  
  """)

Going further

Have a look at those resources, they also show how to create a GraphQL schema in Dgraph and use custom DQL to extend the generated GraphQL API.

https://dgraph.io/docs/graphql/schema/

https://dgraph.io/docs/graphql/custom/dql/