GraphLake Documentation

Version: 0.1.0

Date: January 18, 2025

Introduction

GraphLake is a graph database that supports a unified OneGraph model, seamlessly combining property graphs and RDF. This means you do not need to choose between the two graph types; it supports both simultaneously, offering flexibility and versatility for a wide range of use cases.

Designed for scalability and performance, GraphLake supports both large analytical workloads and Online Transaction Processing (OLTP) scenarios. Its architecture draws inspiration from the MPP style of systems like Apache Iceberg and DeltaLake. However, GraphLake introduces automatic data partitioning, with representation and filtering mechanisms specifically optimized for graph data structures.

GraphLake is engineered to work efficiently with files stored locally on high-performance NVMe or SSD drives or in cloud storage solutions such as Amazon S3 (and S3-compatible options like MinIO) or Azure Blob Storage. The system allows independent scaling of compute and storage resources to match workload requirements. For instance, you can store large volumes of data cost-effectively while using a small compute instance, or opt for a configuration with smaller datasets and larger compute resources to support high-performance querying.

Leveraging parallel processing and memory, GraphLake delivers impressive performance, with the ability to scale hardware resources for even greater efficiency.

GraphLake is not offered as a managed service. Instead, it is designed to be deployed within your cloud environment or run on your hardware, giving you complete control over your setup and infrastructure. We provide a developer edition that is free to use indefinitely, making it easy to get started with GraphLake and explore its capabilities.

Versions

There are 4 versions of GraphLake

Installation

GraphLake developer edition is available now, commercial releases will be coming soon. If you would like to know more or start using GraphLake commercially, please email "contact @ dataplatformsolutions.com"

GraphLake is available as a docker image. The docker image is available here on docker hub.

Downloads

Either download and unpack the binary from the .zip package and place it on your path, or pull the docker image:

docker pull dataplatformsolutions/graphlake:latest

Cloud Deployments

To support deploying GraphLake into your cloud we have provided example terraform scripts that can be used and adapted to a specific setup. Deployment of GraphLake through the AWS and Azure marketplace is coming soon.

AWS

AWS terraform on https://github.com/dataplatformsolutions/graphlake-deploy

Available in the AWS marketplace - coming soon!

Azure

Azure terraform on https://github.com/dataplatformsolutions/graphlake-deploy

Available in the Azure marketplace - coming soon!

Digital Ocean

Digital Ocean terraform on https://github.com/dataplatformsolutions/graphlake-deploy

Config and Startup

GraphLake supports the following startup config environment variables:

adminPassword := os.Getenv("GRAPHLAKE_ADMIN_PASSWORD")
Environment Variable Description
GRAPHLAKE_BACKEND_TYPE The storage backend to use (local, s3, azure) - only local is supported at the moment
GRAPHLAKE_STORE_PATH The path to store data when using local backend
GRAPHLAKE_PORT The port to listen on
GRAPHLAKE_LOG_LEVEL The log level to use - debug or info
GRAPHLAKE_CACHE_SIZE Size of datafile cache
GRAPHLAKE_IMPORT_LOCATION Path relative to store path. Used to locate files for import
GRAPHLAKE_ADMIN_PASSWORD Password for user admin. Used for Business edition and up to secure GraphLake.

To run GraphLake in a Docker container as a detached process, passing the environment variables and mapping a local folder to the configured data path, you would run:

docker run -d \
      -e GRAPHLAKE_BACKEND_TYPE=local \
      -e GRAPHLAKE_STORE_PATH=/data \
      -e GRAPHLAKE_IMPORT_LOCATION=import \
      -v /path/to/local/data:/data \
      dataplatformsolutions/graphlake:latest

API

Query

POST /stores/:store/query

Execute a query in the specified :store. GraphLake supports three different query langauges: SPARQL (core subset), SemanticCypher, and GraphLake Javascript. See the sections in the query section for details on each langauge. Use the following content types to specify which query you are using: application/sparql, application/x-graphlake-query-semanticcypher, application/x-graphlake-query-javascript.

Method POST
Endpoint /stores/:store/query
Description Runs a query on a given store.
Request Body

A string containing the query. The format depends on the underlying query engine.


POST /stores/myStore/query
Content-Type: text/plain

SELECT ?s ?p ?o WHERE { ?s ?p ?o }
            
Response

JSON result of the query. For example:


HTTP/1.1 200 OK
Content-Type: application/json

{
  "results": [
    {
      "s": "http://example.org#subject1",
      "p": "http://example.org#predicate1",
      "o": "http://example.org#object1"
    },
    ...
  ]
}
            

Note the query result structure can differ based on the query type.


Import Data into a Graph

POST /stores/:store/graphs/:graph/import

Import data (in N-Triples format, or CSV) into a specified :graph. To import csv as nodes use the content type

application/x-graphlake-node+csv
and for edges use
application/x-graphlake-node+csv
.

Method POST
Endpoint /stores/:store/graphs/:graph/import
Description Imports data into the given graph. The data can be sent via request body or by specifying a file parameter.
Query Parameters
  • file (optional): The name of the file (located in the configured import directory). If not provided, the server expects data in the request body.
Request Body

If file query param is not provided, send the data in the request body:


POST /stores/myStore/graphs/myGraph/import
Content-Type: application/n-triples

<http://example.org#subject> <http://example.org#predicate> <http://example.org#object> .
            

Use the appropriate content type to allow the server to know what to import.

CSV Structure Documentation Nodes CSV

The CSV file for nodes should define each node with its unique identifier, label, and properties. Each row represents a single node.

Required Columns:
  • id: A unique identifier for the node.
  • label: The type or label of the node (e.g., Person, Company).
Optional Columns:
  • Additional properties of the node (e.g., name, age, location).
Example:
id label name age
1 Person Alice 30
2 Person Bob 40
3 Company Acme Corp
Edges CSV

The CSV file for edges should define the relationships between nodes. Each row represents a single edge.

Required Columns:
  • source: The ID of the source node.
  • target: The ID of the target node.
  • relationship: The type of relationship (e.g., FRIEND, WORKS_AT).
Optional Columns:
  • Additional properties of the relationship (e.g., weight, timestamp).
Example:
source target relationship weight
1 2 FRIEND 1.0
2 3 WORKS_AT 0.8
Usage Notes
  • The id in the nodes CSV and the source and target in the edges CSV must match to correctly associate nodes and relationships.
  • Ensure that the CSV files use a consistent delimiter (e.g., commas).
  • Handle missing data (e.g., empty cells) appropriately based on the graph database's requirements.
Response

Returns 200 OK on successful import.


Create Graph

POST /stores/:store/graphs/:graph

Create an empty graph in the specified :store.

Method POST
Endpoint /stores/:store/graphs/:graph
Description Creates a new graph with the provided name.
Request Body No body is required. The :graph path parameter is the graph name.
Response

HTTP/1.1 200 OK
{
  "message": "Graph created"
}
            

List Graphs

GET /stores/:store/graphs

Retrieve a list of available graphs in the specified :store.

Method GET
Endpoint /stores/:store/graphs
Description Lists all graph names in the store.
Response

An array of strings, each representing a graph name.


HTTP/1.1 200 OK
Content-Type: application/json

[
  "graph1",
  "graph2",
  ...
]
            

Delete Graph

DELETE /stores/:store/graphs/:graph

Delete a specific graph within a store.

Method DELETE
Endpoint /stores/:store/graphs/:graph
Description Deletes the specified graph.
Request Body No body is required. The :graph path parameter identifies the graph.
Response

HTTP/1.1 200 OK
{
  "message": "Graph deleted"
}
            

Create Store

POST /stores/:store

Create a new store.

Method POST
Endpoint /stores/:store
Description Creates a new store using the provided name.
Request Body No request body is required. :store is the store name.
Response

HTTP/1.1 200 OK
{
  "message": "Store created"
}
            

List Stores

GET /stores

Retrieve a list of all stores. If security is enabled, only stores accessible to the current user are returned.

Method GET
Endpoint /stores
Description Lists all stores.
Response

An array of strings, each representing a store name.


HTTP/1.1 200 OK
[
  "store1",
  "store2",
  ...
]
            

Delete Store

DELETE /stores/:store

Delete a specified store. This operation is irreversible.

Method DELETE
Endpoint /stores/:store
Description Deletes the specified store.
Response

HTTP/1.1 200 OK
{
  "message": "Store deleted"
}
            

Add User

POST /admin/users

Create a new user in the system.

Method POST
Endpoint /admin/users
Description Add a new user with a username, password, and optional public key.
Request Body

JSON object containing user information:


{
  "username": "testuser",
  "password": "secretpassword",
  "public_key": "-----BEGIN PUBLIC KEY-----..."
}
            
Response

HTTP/1.1 200 OK
{
  "message": "User added"
}
            

Delete User

DELETE /admin/users/:username

Delete an existing user by username.

Method DELETE
Endpoint /admin/users/:username
Description Deletes the specified user from the system.
Response

HTTP/1.1 200 OK
{
  "message": "User deleted"
}
            

Generate Key Pair for a User

POST /admin/users/:username/keypair

Generate a new private/public key pair for the specified user.

Method POST
Endpoint /admin/users/:username/keypair
Description Generates a new key pair and stores the public key for the user. The private key is returned in the response.
Response

Returns a JSON object containing the newly generated key pair:


HTTP/1.1 200 OK
{
  "private_key": "-----BEGIN PRIVATE KEY-----...",
  "public_key": "-----BEGIN PUBLIC KEY-----..."
}
            

Authenticate with Password

POST /admin/authenticate/password

Obtain a JWT token by providing username and password.

Method POST
Endpoint /admin/authenticate/password
Description Authenticates a user with password and returns a JWT token if successful.
Request Body

JSON object with username and password fields:


{
  "username": "testuser",
  "password": "secretpassword"
}
            
Response

HTTP/1.1 200 OK
{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}
            

Authenticate with JWT

POST /admin/authenticate/jwt

Verify an existing JWT and obtain a renewed token.

Method POST
Endpoint /admin/authenticate/jwt
Description Verifies a JWT and returns a new JWT if valid.
Request Body

JSON object with a token field:


{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}
            
Response

HTTP/1.1 200 OK
{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... (new token)"
}
            

Set User Rules

POST /admin/users/:username/rules

Define or update the security rules for a specified user.

Method POST
Endpoint /admin/users/:username/rules
Description Sets security/ACL rules for a user.
Request Body

An array of rules. Each rule typically contains resource patterns and permissions (e.g. "read", "write"). Exact structure may vary based on the security.Rule definition:


[
  {
    "resource": "/stores/myStore/graphs/graph1",
    "permission": "read"
  },
  {
    "resource": "/stores/myStore",
    "permission": "write"
  }
]
            
Response

HTTP/1.1 200 OK
{
  "message": "Rule added"
}
            

CURL Examples


    # Example: Create a store and a graph, then import data.

    # 1. Create a new store
    curl -X POST http://localhost:8080/stores/myNewStore
    
    # 2. Create a new graph in that store
    curl -X POST http://localhost:8080/stores/myNewStore/graphs/myGraph
    
    # 3. Import N-Triples data from a file
    curl -X POST http://localhost:8080/stores/myNewStore/graphs/myGraph/import \
         -H "Content-Type: text/plain" \
         --data-binary @path/to/your/file.nt
    
    # 4. Query the data
    curl -X POST http://localhost:8080/stores/myNewStore/query \
         -H "Content-Type: text/plain" \
         --data "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"

Data Import

Query

SPARQL

GraphLake supports a subset of SPARQL features for querying RDF data. This document outlines the supported features, including query forms, patterns, and functions.

Query Forms

  1. SELECT
  2. ASK

Patterns

  1. Triple Patterns
  2. Group Graph Patterns
  3. Optional Patterns
  4. Filter Patterns

Functions

  1. Bound
  2. isIRI
  3. isLiteral
  4. isBlank
  5. STR
  6. LANG
  7. DATATYPE
  8. CONTAINS
  9. STRSTARTS
  10. STRENDS

GraphLake's SPARQL support includes essential query forms, patterns, and functions to perform effective graph data queries. Use the provided examples to construct and execute your SPARQL queries.

SemanticCypher

This is still being developed but in essence it is OpenCypher but we introduce a namespace directive and then change the form of variables and identifiers, to be gra?things:Person rather than cypher with gra:Person. There are also some built in properties like __id, which is the subject URI of a node.

GraphLake JS

GraphLake provides a JavaScript-based query language to interact with the graph data. This language allows you to perform various operations such as matching triples, committing transactions, and writing query results.

GraphLake executes JS and inserts into the runtime two objects,

_context
and
_writer
. The context object is used to update and query the store. The writer object allows to data to be sent back to the calling application.

Match Triples

_context.matchTriples(subject, predicate, object, datatype, isLiteral, graphs)

Description: Matches triples in the specified graphs.

Parameters:

Returns: An iterator for the matched triples.

Assert Triple

_context.assertTriple(subject, predicate, object, datatype, isLiteral, graph)

Description: Asserts a new triple in the specified graph.

Parameters:

Commit Transaction

_context.commit()

Description: Commits the current transaction.

Returns: true or false if the transaction was successful.

Delete Triple

_context.deleteTriple(subject, predicate, object, datatype, isLiteral, graph)

Description: Deletes a triple from the specified graph.

Parameters:

Write Header

_writer.writeHeader(headers)

Description: Writes the header for the query result.

Parameters:

Returns: Undefined.

Write Row

_writer.writeRow(row)

Description: Writes a row to the query result.

Parameters:

Returns: Undefined.

Examples

Example 1: Matching Triples

  _writer.writeHeader(["Subject", "Predicate", "Object"]);
  
  let triplesIter = _context.matchTriples("http://example.org/subject", "", "", "", true, ["graph1"]);
  while (true) {
      let triple = triplesIter.next();
      if (triple == null) {
          break;
      }
      _writer.writeRow([triple.subject, triple.predicate, triple.object]);
  }
      

Explanation: This script matches all triples with the subject http://example.org/subject in graph1 and writes the results to the output.

Example 2: Simple Transaction

  // Delete an existing triple
  _context.deleteTriple("http://example.org/subject", "http://example.org/predicate", "http://example.org/object", "", false, "graph1");

  // Add two new triples
  _context.assertTriple("http://example.org/subject1", "http://example.org/predicate1", "http://example.org/object1", "", false, "graph1");
  _context.assertTriple("http://example.org/subject2", "http://example.org/predicate2", "http://example.org/object2", "", false, "graph1");

  // Commit the transaction
  _context.commit();
      

Explanation: This script commits the current transaction.

Example 3: Writing Headers and Rows

  _writer.writeHeader(["Id", "Name"]);
  
  let triplesIter = _context.matchTriples("http://example.org/person", "http://example.org/name", "", "", true, ["graph1"]);
  while (true) {
      let triple = triplesIter.next();
      if (triple == null) {
          break;
      }
      _writer.writeRow([triple.subject, triple.object]);
  }
      

Explanation: This script writes a header with columns "Id" and "Name", matches triples with the subject http://example.org/person and predicate http://example.org/name in graph1, and writes the results to the output.

Example 4: Matching Triples with Different Graphs

  _writer.writeHeader(["Subject", "Predicate", "Object"]);
  
  let graphs = ["graph1", "graph2"];
  let triplesIter = _context.matchTriples("http://example.org/subject", "", "", "", true, graphs);
  while (true) {
      let triple = triplesIter.next();
      if (triple == null) {
          break;
      }
      _writer.writeRow([triple.subject, triple.predicate, triple.object]);
  }
      

Explanation: This script matches all triples with the subject http://example.org/subject in both graph1 and graph2 and writes the results to the output.

Security

GraphLake uses signed JWT to allow access to the API. Tokens are fetched from the API by authenticating with either a signed JWT token using a local private key or user name and password. Users are managed on the server to give read, write and owner access to stores and graphs.

When a security manager is enabled, certain endpoints require a valid JWT token for authorization. The server applies ACL checks to ensure that a user can only access the stores and graphs permitted by their assigned rules. If you receive a 401 (Unauthorized) or 403 (Forbidden) response, check that:

The developer edition runs unsecured.

Monitoring

GraphLake writes structured logs to StdErr which when run in docker goes into the docker logs. We recommend shipping these logs to your favourite log management system and configuring any notifications you require.

VSCode client

To enable a great developer experience we provide a VSCode extension for managing stores and graphs, running queries, and user managemment. All operations can also be done from CURL or programtically over http in any language.

Installation

Support

If you have issues with GraphLake then please reach out to us on discord, file an issue on public github repository, or send an email to "contact @ dataplatformsolutions.com".