Dgraph 24.0.0-alpha is now available on Github and DockerHub

Dgraph 24.0.0-alpha is now available on Github and DockerHub

Dgraph v24.0.0-alpha is available now for the community to try out the support for vector data type which enables semantic search.

Dgraph is adding vector support to combine graph data with embeddings, enhancing Graph-based applications and unlocking new AI capabilities. Core graph use cases like fraud detection, recommendations, and master data management can all be supercharged by vectors and embeddings. Graph+Vector is also a key technique used to  reduce hallucinations within AI-augmented applications.

This release also includes some performance enhancements and maintenance bug fixes to improve the stability of the database engine.

Key highlights of the release include:

  • Support for a native vector type at the DQL level

  • Extend Liveloader to work with the vector type (Bulkloader will be available in GA)

  • Community contributed PRs:

    • #9030: Add support for Polish Language <add links to the PRs>
    • #9047: Reduce x.ParsedKey memory allocation from 72 to 56 bytes
  • Dgraph/Badger fixes:

    • #9007: Fix deadlock occurring due to time-out
    • #2018: Reduce resource consumption on empty write transaction
  • Update to Golang v1.22 - performance and monitoring improvements

  • Upgraded Golang client

  • Number of CVE Fixes

We are working towards a GA release candidate and expect it to be out in May. Dgraph v24 GA will also include GraphQL support for the vector data type and semantic search, a new caching approach that will boost performance of all applications, and a number of community PRs and maintenance fixes.

Note that this (alpha) release is not available on Dgraph Cloud, but the GA release will be released for both on-premise and Dgraph Cloud options. The release binaries and release notes are now available on GitHub. The docker images for dgraph/dgraph and dgraph/standalone are available on DockerHub.

A simple example of using vector embeddings and similarity search queries is shown below. More examples will follow in blog posts and docs in the coming weeks. This example talks about using Ratel for the schema update, mutations and queries, but you can use any approach.

Setup and install dgraph and ratel

Get a Dgraph docker container for the v24 alpha version

docker pull dgraph/standalone:v24.0.0-alpha2 

Run a docker container, storing data on your local machine

mkdir ~/dgraph

docker run -d --name dgraph-v24alpha2 -p "8080:8080" -p "9080:9080"  -v ~/dgraph:/dgraph dgraph/standalone:standalone:v24.0.0-alpha2`

Then get and start the ratel tool

docker pull dgraph/ratel

docker run -d --name ratel -p "8000:8000"  dgraph/ratel:latest

Ratel will now be running on localhost:8000

Add a schema, data and test queries

Define a DQL Schema. You can set this via the Ratel schema tab using the bulk edit option.

<Issue.description>: string.

<Issue.vector_embedding>: float32vector @index(hnsw(metric:"cosine")) .
type <Issue> {
      Issue.description
      Issue.vector_embedding
}

Notice that the new float32vector type is used, with a new index type of hnsw. The hnsw can use a distance metric of cosine, euclidean, or dotproduct . Here we use cosine similarity, which works great if your vectors are not normalized.

At this point, the database will accept and index float vectors.

Insert some data containing short, test-only embeddings using this DQL Mutation

You can paste this into Ratel as a mutation, or use curl, pydgraph or similar:

{
   "set": 
    [
      {
         "dgraph.type": "Issue",
         "Issue.vector_embedding": "[1, 0]",
         "Issue.description":"Intermittent timeouts. Logs show no such host error."
      },
      {  "dgraph.type": "Issue",
         "Issue.vector_embedding": "[0.866025, 0.5]",
         "Issue.description":"Bug when user adds record with blank surName. Field is required so should be checked in web page."
      },
      {
         "dgraph.type": "Issue",
         "Issue.vector_embedding": "[0.5, 0.866025]",`
         "Issue.description":"Delays on responses every 30 minutes with high network latency in backplane"
      },
      {
         "dgraph.type": "Issue",
         "Issue.vector_embedding": "[0, 1]",
         "Issue.description":"vSlow queries intermittently. The host is not found according to logs."
      },
      {  "dgraph.type": "Issue",
         "Issue.vector_embedding": "[-0.5, 0.866025]",
         "Issue.description":"Some timeouts. It seems to be a DNS host lookup issue. Seeing No Such Host message."
      },
      {
         "dgraph.type": "Issue",
         "Issue.vector_embedding": "[-0.866025, 0.5]",
         "Issue.description":"Host and DNS issues are causing timeouts in the User Details web page"
      }
    ]
} 

A simple query that finds similar questions

You are ready to do similarity queries, to find Issues based on semantic similarity to a new Issue description! For simplicity, we are not computing large vectors from an LLM. The embeddings above simply represent four concepts which are in the four vector dimensions: which are, respectively:

  • Slowness or delays

  • Logging or messages

  • Networks

  • GUIs or web pages**


Use case and query

Let’s say a new issue comes in, and you want to use the text description to find other, similar issues you have seen in the past. Use the similarity query below.

If the new issue description is “Slow response and delay in my network!”, we represent this new issue as the vector [0.9, 0.8, 0, 0]. The first “slowness” dimension is high because the description mentions both “slow response” and “delay.” “Logs” is mentioned once, so set dimension two to 0.8. Neither networks nor GUIs are mentioned, so leave those at 0. Note that the first parameter to similar_to is the DQL field name, the second parameter is the number of results to return, and the third parameter is the vector to look for.

query slownessWithLogs() {
    simVec(func: similar_to(Issue.vector_embedding, 3, "[0.9, 0.8, 0, 0]")) 
    {     
        uid
        Issue.description
    }
  }

If you want to send in data using parameters, rewrite this as

query test($vec: float32vector) {
    simVec(func: similar_to(Issue.vector_embedding, 3, $vec)) 
    {
        uid
        Issue.description
    }
}

And make a request (again using Ratel) with variable named “vec” set to a JSON float[] value:

vec: [0.9, 0.8, 0, 0]

Curl alternative 

#Finally, for those who do not prefer to use Ratel, you can do all these steps via HTTP tools, such as curl:

curl --location 'http://localhost:8080/query' \
--header 'Content-Type: application/json' \
--data '{
  "query": "query test($vec: float32vector) { simVec(func: similar_to(Issue.vector_embedding, 3, $vec)) { uid Issue.description } }",
  "variables":{"$vec":"[1,0,0,0]"}
  }'

Summing it up

This end-to-end example shows how you can insert data with vector embeddings, corresponding to a schema that specifies a cosine-similarity based vector index, and do a semantic search for Issues via the new similar_to() function in Dgraph.