GraphLake Documentation

Version: 0.3.51

Date: October 10, 2025

Introduction

GraphLake is a graph database that supports a unified OneGraph model, seamlessly combining property graphs and RDF. This means you do not need to choose between the two graph types; it supports both simultaneously, offering flexibility and versatility for a wide range of use cases.

Designed for scalability and performance, GraphLake supports both large analytical workloads and Online Transaction Processing (OLTP) scenarios. Its architecture draws inspiration from the MPP style of systems like Apache Iceberg and DeltaLake. However, GraphLake introduces automatic data partitioning, with representation and filtering mechanisms specifically optimized for graph data structures.

GraphLake is engineered to work efficiently with files stored locally on high-performance NVMe or SSD drives or in cloud storage solutions such as Amazon S3 (and S3-compatible options like MinIO) or Azure Blob Storage. The system allows independent scaling of compute and storage resources to match workload requirements. For instance, you can store large volumes of data cost-effectively while using a small compute instance, or opt for a configuration with smaller datasets and larger compute resources to support high-performance querying.

Leveraging parallel processing and memory, GraphLake delivers impressive performance, with the ability to scale hardware resources for even greater efficiency.

GraphLake is not offered as a managed service. Instead, it is designed to be deployed within your cloud environment or run on your hardware, giving you complete control over your setup and infrastructure. We provide a developer edition that is free to use indefinitely, making it easy to get started with GraphLake and explore its capabilities.

Installation

GraphLake developer edition is available now, commercial releases will be coming soon. If you would like to know more or start using GraphLake commercially, please email "contact @ dataplatformsolutions.com"

Downloads

Download the latest docker image

Pull the docker image with:

docker pull dataplatformsolutions/graphlake:latest

Download the Visual Studio Code extension from https://graphlake.net/downloads/graphlake-1.0.0.vsix

Cloud Deployments

To support deploying GraphLake into your cloud we have provided example terraform scripts that can be used and adapted to a specific setup. Deployment of GraphLake through the AWS and Azure marketplace is coming soon.

AWS

AWS terraform on https://github.com/dataplatformsolutions/graphlake-deploy

Available in the AWS marketplace - coming soon!

Azure

Azure terraform on https://github.com/dataplatformsolutions/graphlake-deploy

Available in the Azure marketplace - coming soon!

Digital Ocean

Digital Ocean terraform on https://github.com/dataplatformsolutions/graphlake-deploy

Config and Startup

GraphLake supports the following startup config environment variables:

Environment Variable	Description
GRAPHLAKE_BACKEND_TYPE	The storage backend to use (local, s3, azure) - only local is supported at the moment
GRAPHLAKE_STORE_PATH	The path to store data when using local backend
GRAPHLAKE_PORT	The port to listen on (default 7642)
GRAPHLAKE_LOG_LEVEL	The log level to use - debug or info
GRAPHLAKE_FILE_CACHE_SIZE	Size of data file cache (defaults to 100 entries per shard)
GRAPHLAKE_METADATA_CACHE_SIZE	Size of metadata (.meta) cache (defaults to 100 entries)
GRAPHLAKE_SNAPSHOT_CACHE_SIZE	Number of snapshots cached in memory (defaults to 5)
GRAPHLAKE_ADMIN_PASSWORD	Password for user admin. Used for Business edition and up to secure GraphLake.
GRAPHLAKE_AZURE_ACCOUNT_NAME	Azure storage account name used when backend type is azure
GRAPHLAKE_AZURE_ACCOUNT_KEY	Azure storage account key
GRAPHLAKE_AWS_ACCESS_KEY	AWS access key for S3 backend
GRAPHLAKE_AWS_SECRET_KEY	AWS secret key for S3 backend
GRAPHLAKE_AWS_REGION	AWS region for S3 backend
GRAPHLAKE_S3_BUCKET	Bucket name used for S3 backend
GRAPHLAKE_S3_ENDPOINT	Custom S3 endpoint (optional)
GRAPHLAKE_DOCKER_VERSION_FILE	File containing version information when running inside Docker
GRAPHLAKE_MULTI_NODE	Set to `true` to enable multi-node coordination

To run GraphLake in a Docker container as a detached process, passing the environment variables and mapping a local folder to the configured data path, you would run:

docker run -d \
      -e GRAPHLAKE_BACKEND_TYPE=local \
      -e GRAPHLAKE_STORE_PATH=/data \
      -v /path/to/local/data:/data \
      -p 7642:7642 \
      dataplatformsolutions/graphlake:latest

Versions

There are 4 versions of GraphLake

Developer Edition - free for devs forever
Business - license required
Business Pro - license required
Enterprise - license required

For enterprise editions please reach out to us on email at contact@dataplatformsolutions.com to discuss your requirements.

Getting Started Examples


    # Example: Create a store and a graph, then import data.

    # 1. Create a new store
    curl -X POST http://localhost:8080/stores/myNewStore
    
    # 2. Create a new graph in that store
    curl -X POST http://localhost:8080/stores/myNewStore/graphs/myGraph
    
    # 3. Import N-Triples data from a file
    curl -X POST http://localhost:8080/stores/myNewStore/graphs/myGraph/import \
         -H "Content-Type: text/plain" \
         --data-binary @path/to/your/file.nt
    
    # 4. Query the data
    curl -X POST http://localhost:8080/stores/myNewStore/query \
         -H "Content-Type: text/plain" \
         --data "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"
    
    # 5. Update with an invariant
    curl -X POST http://localhost:8080/stores/myNewStore/update_with_invariant \
         -H "Content-Type: application/json" \
         -d '{ "update": "INSERT DATA { GRAPH <http://example.com/graph> { <http://example.com/alice> <http://example.com/name> "Alice" . } }", "invariant": "ASK WHERE { <http://example.com/alice> <http://example.com/name> "Alice" . }" }'

Data Import

Data can be loaded into a graph using the /stores/:store/import endpoint. N-Triples, Turtle, and CSV are supported. Provide data in the request body or reference files that have been copied to the server.

If the format query parameter is omitted, the server infers the format from the Content-Type header. Use text/turtle for Turtle data and application/n-triples for N-Triples.

When importing from files, place them under the store’s import directory. For a store named myStore this path is /stores/myStore/import inside the store path (for the Docker image the store path is mounted at /store, so the full path would be /store/stores/myStore/import). The location parameter refers to a file or folder inside this directory.

Example: Import data sent in the request body

POST /stores/myStore/import?graph=myGraph
Content-Type: application/n-triples

<http://example.org/s> <http://example.org/p> "obj" .

Example: Import Turtle data

POST /stores/myStore/import?graph=myGraph
Content-Type: text/turtle

@prefix ex: <http://example.org/> .
ex:s ex:p "obj" .

Example: Import data from a file on the server

Copy data.nt to /store/stores/myStore/import/data.nt and then call:

POST /stores/myStore/import?graph=myGraph&location=data.nt

For CSV imports, include format=csv and specify csvType=nodes or csvType=edges:

POST /stores/myStore/import?graph=myGraph&location=nodes.csv&format=csv&csvType=nodes

CSV files describe either nodes or edges:

Node CSV

rid (optional): existing node identifier. If omitted a blank node identifier is generated.
type, type_2, ... (optional): values become rdf:type triples.
graph (optional): named graph; defaults to the import graph.
Other columns are predicates. Semicolon-separated values create multiple triples. Column names and values without a prefix are expanded using predicatePrefix, typePrefix, and idPrefix.

With predicatePrefix=http://example.com/, typePrefix=http://example.com/, and idPrefix=http://example.com/:

rid,type,graph,name,tags
1,person,people,Alice,"rdf;graph;ai"

Edge CSV

from and to: subject and object node ids or IRIs.
pid or predicate: predicate for the edge.
graph (optional): named graph for the edge.
Additional columns become properties of the edge via the triple's identifier.

Values in from and to without a prefix are expanded using idPrefix. The pid/predicate value and any property column names are expanded using predicatePrefix.

Edges must reference existing nodes by rid or full IRI. Nodes imported without a rid receive generated identifiers that cannot be known ahead of time, so provide explicit rid values for nodes that will be linked by edge CSVs.

With idPrefix=http://example.com/ and predicatePrefix=http://example.com/:

from,to,pid,since
1,2,knows,2010

This yields <http://example.com/1> <http://example.com/knows> <http://example.com/2> and a property <http://example.com/since> "2010".

All CSV files require a header row.

Importing with a Graph Manifest

A graph manifest lets you import multiple graphs in one request. Create a folder inside the import directory and add a graphs.json file mapping graph names to subdirectories that hold the data files:

{
  "people": "people-data",
  "products": "products-data"
}

The folder structure would be:

/store/stores/myStore/import/batch/
  graphs.json
  people-data/
    people.nt
  products-data/
    products.nt

Import all graphs listed in the manifest with:

POST /stores/myStore/import?location=batch&graphManifest=true

The optional branch parameter imports into a specific branch (default is main).

Query

SPARQL

GraphLake supports a subset of SPARQL features for querying RDF data. This document outlines the supported features, including query forms, patterns, and functions.

Below is a simple example that creates a graph and queries it.

curl -X POST http://localhost:7642/stores/test/graphs/example
curl -X POST http://localhost:7642/stores/test/import?graph=example \
     -H "Content-Type: text/plain" \
     --data "<s> <p> <o> ."
curl -X POST http://localhost:7642/stores/test/query \
     -H "Content-Type: application/sparql" \
     --data "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"

Query Forms

SELECT

Description: Retrieves variables bound in a query pattern match.
Example:


SELECT ?subject ?predicate ?object
WHERE {
  ?subject ?predicate ?object.
}

ASK
- Description: Returns a boolean indicating whether a query pattern matches.
- Example:

Patterns

Triple Patterns

Description: Basic graph pattern matching triples.
Example:


SELECT ?subject ?object
WHERE {
  ?subject  ?object.
}

Group Graph Patterns

Description: A set of triple patterns enclosed in braces.
Example:


SELECT ?subject ?object
WHERE {
  {
    ?subject  ?object.
  }
  UNION
  {
    ?subject  ?object.
  }
}

Optional Patterns

Description: Patterns that may or may not match.
Example:


SELECT ?subject ?object ?optionalObject
WHERE {
  ?subject  ?object.
  OPTIONAL { ?subject  ?optionalObject. }
}

Filter Patterns

Description: Restricts solutions based on a boolean expression.
Example:


SELECT ?subject ?object
WHERE {
  ?subject  ?object.
  FILTER (?object > 10)
}

Functions

Bound

Description: Checks if a variable is bound.
Example:


SELECT ?subject
WHERE {
  ?subject  ?object.
  FILTER (BOUND(?object))
}

isIRI

Description: Checks if a term is an IRI.
Example:


SELECT ?subject
WHERE {
  ?subject ?predicate ?object.
  FILTER (isIRI(?subject))
}

isLiteral

Description: Checks if a term is a literal.
Example:


SELECT ?subject
WHERE {
  ?subject ?predicate ?object.
  FILTER (isLiteral(?object))
}

isBlank

Description: Checks if a term is a blank node.
Example:


SELECT ?subject
WHERE {
  ?subject ?predicate ?object.
  FILTER (isBlank(?subject))
}

STR

Description: Returns the string representation of a term.
Example:


SELECT ?subject (STR(?object) AS ?objectStr)
WHERE {
  ?subject ?predicate ?object.
}

LANG

Description: Returns the language tag of a literal.
Example:


SELECT ?subject (LANG(?object) AS ?lang)
WHERE {
  ?subject ?predicate ?object.
}

DATATYPE

Description: Returns the datatype IRI of a literal.
Example:


SELECT ?subject (DATATYPE(?object) AS ?datatype)
WHERE {
  ?subject ?predicate ?object.
}

CONTAINS

Description: Checks if a string contains another string.
Example:


SELECT ?subject
WHERE {
  ?subject ?predicate ?object.
  FILTER (CONTAINS(STR(?object), "example"))
}

STRSTARTS

Description: Checks if a string starts with another string.
Example:


SELECT ?subject
WHERE {
  ?subject ?predicate ?object.
  FILTER (STRSTARTS(STR(?object), "http://"))
}

STRENDS

Description: Checks if a string ends with another string.
Example:


SELECT ?subject
WHERE {
  ?subject ?predicate ?object.
  FILTER (STRENDS(STR(?object), ".org"))
}

TRIPLECOUNT
- Description: Returns the number of triples in the store or in a specified named graph.
- Example: Count all triples in the store.
- Graph Example: Count triples in a named graph.

Geospatial Filters

GraphLake recognises common GeoSPARQL predicates and functions for spatial search. Use geof:within, geof:intersects, or the Simple Features aliases geof:sfWithin and geof:sfIntersects inside FILTER expressions. Geometry literals may include explicit coordinate reference systems either via the SRID=4326;POLYGON(...) form or by prefixing the geometry with a CRS IRI such as <http://www.opengis.net/def/crs/EPSG/0/4326>. Both styles are parsed and matched during query execution, and triple-quoted string literals can be bound with BIND to hold the WKT typed as geo:wktLiteral when composing complex shapes directly inside the query text.

Property paths can simplify traversal from features to geometry nodes. The example below binds a WKT polygon with a CRS IRI, walks from each field feature to its geometry using geo:hasGeometry/geo:asWKT, and returns the matching field names ordered alphabetically.


PREFIX geo:   <http://www.opengis.net/ont/geosparql#>
PREFIX geof:  <http://www.opengis.net/def/function/geosparql/>
PREFIX :      <http://example.org/farm#>

SELECT ?field ?name
WHERE {
  BIND("<http://www.opengis.net/def/crs/EPSG/0/4326> POLYGON((10.004 58.999, 10.016 58.999, 10.016 59.009, 10.004 59.009, 10.004 58.999))"^^geo:wktLiteral AS ?areaWkt)

  ?field a geo:Feature ; :name ?name ; geo:hasGeometry/geo:asWKT ?wkt .

  FILTER( geof:sfIntersects(?wkt, ?areaWkt) )
}
ORDER BY ?name

When a stored geometry omits a CRS identifier, GraphLake assumes EPSG:4326. Mixing literals that use the SRID= syntax with CRS IRIs will also match as long as they resolve to the same SRID.

GraphLake's SPARQL support includes essential query forms, patterns, and functions to perform effective graph data queries. Use the provided examples to construct and execute your SPARQL queries.

RDF*

Quoted triples are supported via RDF*. Use the same syntax as SPARQL-star to insert and select embedded triples.

INSERT DATA {
  << :s :p :o >> :source "example" .
}

SELECT ?src WHERE {
  << :s :p :o >> :source ?src .
}

SemanticCypher

SemanticCypher is GraphLake's OpenCypher based language. Queries use a PREFIX statement to declare namespaces. Labels and property names that omit a prefix are expanded using a default namespace. You can declare the default explicitly, otherwise GraphLake falls back to its internal namespace. Every node exposes a reserved rid property holding the RDF subject IRI.

The current implementation supports the following constructs:

PREFIX – declare a namespace prefix.
CREATE – create nodes and relationships.
MERGE – upsert nodes or relationships.
DELETE and DETACH DELETE – remove nodes or relationships.
MATCH – pattern matching over multiple hops.
OPTIONAL MATCH – optional pattern matching.
WHERE with =, !=, <, <=, >, >=, AND, OR.
Variable length paths using * with range syntax.
SET – assign properties.
REMOVE – drop properties or labels from nodes.
WITH – chain query parts.
UNWIND – iterate over list expressions.
ORDER BY, SKIP and LIMIT – sort and paginate results.
RETURN and RETURN DISTINCT – produce query results.
UNION and UNION ALL – combine result sets.
Aggregate functions COUNT, SUM, AVG, MIN and MAX with grouping.
Parameters using $name syntax in expressions.
List and map literals, variables and function calls such as exists().

Example query returning all people and their identifiers:

PREFIX foaf http://xmlns.com/foaf/0.1/
MATCH (p:foaf:Person)
RETURN p.rid AS id

Matching a node by its resource identifier:

MATCH (p {rid:'http://example.com/Alice'})
RETURN p

Creating nodes and relationships with a declared default namespace for unprefixed property names:

PREFIX ex http://example.org/
PREFIX : http://schema.org/
CREATE (a:ex:Person {name:'Alice'})-[:ex:KNOWS]->(b:ex:Person {name:'Bob'})

Here the property name is expanded to http://schema.org/name. If no default is declared, unqualified identifiers are placed in GraphLake's internal namespace.

Matching with filters and property updates:

MATCH (p:ex:Person)-[r:ex:KNOWS]->(f:ex:Person)
WHERE p.name = 'Alice' AND f.age >= 30
SET r.since = 2020
RETURN p.name AS person, f.name AS friend

Using UNWIND and WITH to work with lists:

UNWIND [1,2,3] AS n
WITH n WHERE n > 1
RETURN n

Deleting a relationship:

MATCH (a)-[r:ex:KNOWS]->(b)
DELETE (a)-[r]->(b)

Variable length path query:

MATCH (a {rid:'http://example.com/Alice'})-[:ex:KNOWS*1..2]->(b)
RETURN b

Optional matching:

MATCH (a {rid:'http://example.com/Dave'})
OPTIONAL MATCH (a)-[:ex:KNOWS]->(b)
RETURN a, b

Ordering and pagination:

MATCH (a)-[:ex:KNOWS]->(b)
RETURN b ORDER BY b.rid SKIP 1 LIMIT 1

GraphLake JS

GraphLake provides a JavaScript-based query language to interact with the graph data. This language allows you to perform various operations such as matching triples, applying updates, and writing query results.

Example script returning all triples from myGraph:

_writer.writeHeader(["s","p","o"]);
let it = _context.matchTriples("","","","",true,["myGraph"]);
while (true) {
  let t = it.next();
  if (t == null) break;
  _writer.writeRow([t.subject,t.predicate,t.object]);
}

GraphLake executes JS and inserts into the runtime two objects,

_context

and

_writer

. The context object is used to update and query the store. The writer object allows to data to be sent back to the calling application.

Match Triples

_context.matchTriples(subject, predicate, object, datatype, isLiteral, graphs)

Description: Matches triples in the specified graphs.

Parameters:

subject (string): The subject of the triple.
predicate (string): The predicate of the triple.
object (string): The object of the triple.
datatype (string): The datatype of the object.
isLiteral (boolean): Whether the object is a literal.
graphs (array of strings): The graphs to search in.

Returns: An iterator for the matched triples.

Assert Triple

_context.assertTriple(subject, predicate, object, datatype, isLiteral, graph)

Description: Asserts a new triple in the specified graph.

Parameters:

subject (string): The subject of the triple.
predicate (string): The predicate of the triple.
object (string): The object of the triple.
datatype (string): The datatype of the object.
isLiteral (boolean): Whether the object is a literal.
graph (string): The graph to insert into.

Commit Transaction

_context.commit()

Description: Commits the current transaction.

Returns: true or false if the transaction was successful.

Delete Triple

_context.deleteTriple(subject, predicate, object, datatype, isLiteral, graph)

Description: Deletes a triple from the specified graph.

Parameters:

subject (string): The subject of the triple.
predicate (string): The predicate of the triple.
object (string): The object of the triple.
datatype (string): The datatype of the object.
isLiteral (boolean): Whether the object is a literal.
graph (string): The graph to delete from.

Write Header

_writer.writeHeader(headers)

Description: Writes the header for the query result.

Parameters:

headers (array of strings): The header columns.

Returns: Undefined.

Write Row

_writer.writeRow(row)

Description: Writes a row to the query result.

Parameters:

row (array of any): The row data.

Returns: Undefined.

Examples

Example 1: Matching Triples


    _writer.writeHeader(["Subject", "Predicate", "Object"]);
    
    let triplesIter = _context.matchTriples("http://example.org/subject", "", "", "", true, ["graph1"]);
    while (true) {
        let triple = triplesIter.next();
        if (triple == null) {
            break;
        }
        _writer.writeRow([triple.subject, triple.predicate, triple.object]);
    }

Explanation: This script matches all triples with the subject http://example.org/subject in graph1 and writes the results to the output.

Example 2: Simple Transaction


  // Delete an existing triple
  _context.deleteTriple("http://example.org/subject", "http://example.org/predicate", "http://example.org/object", "", false, "graph1");

  // Add two new triples
  _context.assertTriple("http://example.org/subject1", "http://example.org/predicate1", "http://example.org/object1", "", false, "graph1");
  _context.assertTriple("http://example.org/subject2", "http://example.org/predicate2", "http://example.org/object2", "", false, "graph1");

  // Commit the transaction
  _context.commit();

Explanation: This script commits the current transaction.

Example 3: Writing Headers and Rows


  _writer.writeHeader(["Id", "Name"]);
  
  let triplesIter = _context.matchTriples("http://example.org/person", "http://example.org/name", "", "", true, ["graph1"]);
  while (true) {
      let triple = triplesIter.next();
      if (triple == null) {
          break;
      }
      _writer.writeRow([triple.subject, triple.object]);
  }

Explanation: This script writes a header with columns "Id" and "Name", matches triples with the subject http://example.org/person and predicate http://example.org/name in graph1, and writes the results to the output.

Example 4: Matching Triples with Different Graphs


  _writer.writeHeader(["Subject", "Predicate", "Object"]);
  
  let graphs = ["graph1", "graph2"];
  let triplesIter = _context.matchTriples("http://example.org/subject", "", "", "", true, graphs);
  while (true) {
      let triple = triplesIter.next();
      if (triple == null) {
          break;
      }
      _writer.writeRow([triple.subject, triple.predicate, triple.object]);
  }

Explanation: This script matches all triples with the subject http://example.org/subject in both graph1 and graph2 and writes the results to the output.

API

Health

GET /health

Check server status.

Method	GET
Endpoint	`/health`
Description	Returns `{"status":"ok"}` when the server is running.

Query

POST /stores/:store/query

Execute a query in the specified :store. GraphLake supports three different query langauges: SPARQL (core subset), SemanticCypher, and GraphLake Javascript. See the sections in the query section for details on each langauge. Use the following content types to specify which query you are using: application/sparql, application/x-graphlake-query-semanticcypher, application/x-graphlake-query-javascript.

Method	POST
Endpoint	`/stores/:store/query`
Description	Runs a query on a given store.
Query Parameters	`branch` (optional): Branch to query. Defaults to `main`. `time` (optional): Snapshot timestamp in milliseconds since epoch. `tag` (optional): Snapshot tag to query; takes precedence over `time`.
Request Body	A string containing the query. The format depends on the underlying query engine. `POST /stores/myStore/query Content-Type: application/sparql SELECT ?s ?p ?o WHERE { ?s ?p ?o }`
Response	JSON result of the query. For example: `HTTP/1.1 200 OK Content-Type: application/json { "results": [ { "s": "http://example.org#subject1", "p": "http://example.org#predicate1", "o": "http://example.org#object1" }, ... ] }` Note the query result structure can differ based on the query type.

SPARQL GET

GET /stores/:store/sparql?query=...

Execute a SPARQL query via a URL-encoded query parameter.

Method	GET
Endpoint	`/stores/:store/sparql`
Query Parameters	`query` (required): URL-encoded SPARQL query string. `branch` (optional): Branch to query. Defaults to `main`. `time` (optional): Snapshot timestamp in milliseconds since epoch. `tag` (optional): Snapshot tag to query; takes precedence over `time`. `default-graph-uri` (optional): Default graph for the query.
Response	Same as `/stores/:store/query`.

Talk

POST /stores/:store/talk

Generate and run a SPARQL query from a natural language question using an LLM.

Method	POST
Endpoint	`/stores/:store/talk`
Request Body	`{"question": "How many people are there?"}`
Response	JSON query results.

Optional query parameters:

branch: branch name to query.
schemaGraph: graph containing SHACL schema to provide context.

Update with Invariant

POST /stores/:store/update_with_invariant

Execute a SPARQL UPDATE while verifying invariant ASK queries before commit. The transaction aborts if any invariant evaluates to false.

Method	POST
Endpoint	`/stores/:store/update_with_invariant`
Description	Runs a SPARQL UPDATE and validates invariants on the resulting data before committing.
Request Body	JSON object with `update` containing the SPARQL update and `invariant` containing an ASK query: `POST /stores/myStore/update_with_invariant Content-Type: application/json { "update": "INSERT DATA { GRAPH <http://example.com/graph> { <http://example.com/alice> <http://example.com/name> "Alice" . } }", "invariant": "ASK WHERE { <http://example.com/alice> <http://example.com/name> "Alice" . }" }`
Response	Success response: `HTTP/1.1 200 OK { "message": "SPARQL UPDATE executed" }` If the invariant fails: `HTTP/1.1 500 Internal Server Error { "error": "invariant failed" }`

Validate Data

POST /stores/:store/validate

Validate a data graph against SHACL shapes stored in the specified store.

Method	POST
Endpoint	`/stores/:store/validate`
Request Body	`{"schema": "graph_of_shapes", "graph": "data_graph"}`
Response	SHACL validation report in JSON.

Update

POST /stores/:store/update

Execute a SPARQL UPDATE and apply optional metadata and tags to the resulting snapshot. The update may opt out of rollup consolidation.

Method	POST
Endpoint	`/stores/:store/update`
Description	Runs a SPARQL UPDATE and records optional snapshot metadata.
Request Body	Either a raw SPARQL update with `Content-Type: application/sparql-update` or a JSON object containing the update and optional metadata fields: `POST /stores/myStore/update Content-Type: application/json { "update": "WITH <http://g> INSERT { <http://s> <http://p> <http://o> . }", "metadata": {"source": "batch1"}, "tags": ["daily"], "skip_rollup": true }`
Response	`HTTP/1.1 200 OK { "message": "update committed" }`

List Snapshots

GET /stores/:store/snapshots

Retrieve snapshot metadata for a store branch. Additional query parameters filter snapshots by metadata keys.

Method	GET
Endpoint	`/stores/:store/snapshots`
Description	Lists snapshots and their metadata, tags, and rollup flags.
Query Parameters	`branch` (optional): Branch name (defaults to `main`). `tag` (optional): Return only snapshots containing the specified tag. Any other parameter filters snapshots whose metadata contains the matching key/value.
Response	`[ { "version": 1, "tags": ["daily"], "metadata": {"source": "batch1"}, "skip_rollup": true } ]`

Tagged Snapshots

Snapshots can be labeled with user-defined tags during import and update operations. Use the tag query parameter on read-only endpoints like /stores/:store/query to target a specific tagged snapshot. Listing snapshots with ?tag=myTag returns only snapshots carrying that tag.

Export Data

POST /stores/:store/export?graph=graph_name&location=folder

Exports one or more graphs from a branch to files on disk. The optional graph query parameters may be repeated. If location is not provided the server writes files to the store's exports directory.

Method	POST
Endpoint	`/stores/:store/export`
Description	Export data from the specified store and branch.
Query Parameters	`graph` (optional): Graph name to export. Can be specified multiple times. `branch` (optional): Branch to export from. Defaults to `main`. `location` (optional): Folder path under the store to write files.
Response	Returns `200 OK` when the export completes.

Import Data into a Graph

POST /stores/:store/import?location=name_of_file_or_folder&graph=name_of_graph

Start an asynchronous import job for a specified graph. The data can be sent in the request body or loaded in based on files or folders located in the store import folder. The call returns a JSON object containing a jobId.

Use GET /jobs to list running jobs and GET /jobs/{id} to view the status of a particular job. Job status includes the number of triples processed and, once finished, the total triple count obtained via the TRIPLECOUNT SPARQL function.

Optional tag and meta-* query parameters label the snapshot created by the import with tags and metadata.

Method	POST
Endpoint	`/stores/:store/import`
Description	Imports data into the given graph. The data can be sent via request body or by specifying a location parameter.
Query Parameters	`location` (optional): The name of the file or folder (located in the /stores/storename/import directory). If not provided, the server expects data in the request body. `graph` (required): The name of the graph that the data is loaded into `format` (optional): `ntriples`, `turtle`, or `csv`. Defaults to N-Triples. `csvType` (optional): `nodes` or `edges` when `format=csv`. `idPrefix` (optional): prefix for `rid`, `from`, and `to` values. `typePrefix` (optional): prefix for values in `type` columns. `predicatePrefix` (optional): prefix for column names and predicate values. `tag` (optional): Apply a tag to the resulting snapshot. Repeat for multiple tags. `meta-key` (optional): Metadata key/value pairs for the snapshot, e.g., `meta-source=batch1`. If `format` is not specified, the server detects the data format from the `Content-Type` header. Use `text/turtle` for Turtle data and `application/n-triples` for N-Triples.
Request Body	If `location` query param is not provided, send the data in the request body: `POST /stores/myStore/import?graph=myGraph Content-Type: application/n-triples <http://example.org#subject> <http://example.org#predicate> <http://example.org#object> .` For Turtle data, set the header to `text/turtle`: `POST /stores/myStore/import?graph=myGraph Content-Type: text/turtle @prefix ex: <http://example.org/> . ex:subject ex:predicate "object" .`
Response	Returns `200 OK` on successful import.

Jobs

GET /jobs

List currently running background jobs.

Method	GET
Endpoint	`/jobs`
Description	Returns an array of job objects.

GET /jobs/:id

Retrieve the status of a specific job.

Method	GET
Endpoint	`/jobs/:id`
Description	Returns details for job `:id`.

Create Graph

POST /stores/:store/graphs/:graph

Create an empty graph in the specified :store.

Method	POST
Endpoint	`/stores/:store/graphs/:graph`
Description	Creates a new graph with the provided name.
Request Body	No body is required. The `:graph` path parameter is the graph name.
Response	`HTTP/1.1 200 OK { "message": "Graph created" }`

List Graphs

GET /stores/:store/graphs

Retrieve a list of available graphs in the specified :store.

Method	GET
Endpoint	`/stores/:store/graphs`
Description	Lists all graph names in the store.
Response	An array of strings, each representing a graph name. `HTTP/1.1 200 OK Content-Type: application/json [ "graph1", "graph2", ... ]`

Delete Graph

DELETE /stores/:store/graphs/:graph

Delete a specific graph within a store.

Method	DELETE
Endpoint	`/stores/:store/graphs/:graph`
Description	Deletes the specified graph.
Request Body	No body is required. The `:graph` path parameter identifies the graph.
Response	`HTTP/1.1 200 OK { "message": "Graph deleted" }`

Create Store

POST /stores/:store

Create a new store.

Method	POST
Endpoint	`/stores/:store`
Description	Creates a new store using the provided name.
Request Body	No request body is required. `:store` is the store name.
Response	`HTTP/1.1 200 OK { "message": "Store created" }`

List Stores

GET /stores

Retrieve a list of all stores. If security is enabled, only stores accessible to the current user are returned.

Method	GET
Endpoint	`/stores`
Description	Lists all stores.
Response	An array of strings, each representing a store name. `HTTP/1.1 200 OK [ "store1", "store2", ... ]`

Delete Store

DELETE /stores/:store

Delete a specified store. This operation is irreversible.

Method	DELETE
Endpoint	`/stores/:store`
Description	Deletes the specified store.
Response	`HTTP/1.1 200 OK { "message": "Store deleted" }`

Load All Data

POST /stores/:store/loadalldata

Load all data files of a store into memory caches.

Method	POST
Endpoint	`/stores/:store/loadalldata`
Description	Preloads all data files for faster access.
Response	`{"message": "all data files loaded"}`

Add User

POST /admin/users

Create a new user in the system.

Method	POST
Endpoint	`/admin/users`
Description	Add a new user with a username, password, and optional public key.
Request Body	JSON object containing user information: `{ "username": "testuser", "password": "secretpassword", "public_key": "-----BEGIN PUBLIC KEY-----..." }`
Response	`HTTP/1.1 200 OK { "message": "User added" }`

Delete User

DELETE /admin/users/:username

Delete an existing user by username.

Method	DELETE
Endpoint	`/admin/users/:username`
Description	Deletes the specified user from the system.
Response	`HTTP/1.1 200 OK { "message": "User deleted" }`

Generate Key Pair for a User

POST /admin/users/:username/keypair

Generate a new private/public key pair for the specified user.

Method	POST
Endpoint	`/admin/users/:username/keypair`
Description	Generates a new key pair and stores the public key for the user. The private key is returned in the response.
Response	Returns a JSON object containing the newly generated key pair: `HTTP/1.1 200 OK { "private_key": "-----BEGIN PRIVATE KEY-----...", "public_key": "-----BEGIN PUBLIC KEY-----..." }`

Authenticate with Password

POST /admin/authenticate/password

Obtain a JWT token by providing username and password.

Method	POST
Endpoint	`/admin/authenticate/password`
Description	Authenticates a user with password and returns a JWT token if successful.
Request Body	JSON object with `username` and `password` fields: `{ "username": "testuser", "password": "secretpassword" }`
Response	`HTTP/1.1 200 OK { "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." }`

Authenticate with JWT

POST /admin/authenticate/jwt

Verify an existing JWT and obtain a renewed token.

Method	POST
Endpoint	`/admin/authenticate/jwt`
Description	Verifies a JWT and returns a new JWT if valid.
Request Body	JSON object with a `token` field: `{ "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." }`
Response	`HTTP/1.1 200 OK { "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... (new token)" }`

Set User Rules

POST /admin/users/:username/rules

Define or update the security rules for a specified user.

Method	POST
Endpoint	`/admin/users/:username/rules`
Description	Sets security/ACL rules for a user.
Request Body	An array of rules. Each rule typically contains resource patterns and permissions (e.g. "read", "write"). Exact structure may vary based on the `security.Rule` definition: `[ { "resource": "/stores/myStore/graphs/graph1", "permission": "read" }, { "resource": "/stores/myStore", "permission": "write" } ]`
Response	`HTTP/1.1 200 OK { "message": "Rule added" }`

Branch Management

GET /stores/:store/branches

Retrieve a list of branches for the specified store.

POST /stores/:store/branches/:branch

Create a new branch in the specified store. Optionally, specify a source branch using the source query parameter (default is main).

DELETE /stores/:store/branches/:branch

Delete a specific branch in the specified store.

Branches allow concurrent versions of the data. Create a branch from main, import or modify data, and include ?branch=name when querying or importing to use it.

Application Management

POST /applications/:appname

Upload a new application with the specified name. The application definition should be sent in the request body.

GET /applications

List all available applications.

GET /applications/:appname

Retrieve the application with the specified name.

DELETE /applications/:appname

Delete the application with the specified name.

Applications group related event classes and aggregates. Once uploaded they can be referenced by name in the application data endpoints.

Application Data Endpoints

These endpoints allow you to interact with application-specific data classes.

GET /apps/:app/:class

Retrieve a list of items for the specified class in the application.

GET /apps/:app/:class/:id

Retrieve a specific item by ID for the specified class in the application.

POST /apps/:app/:class/:id

Create a new item with the specified ID in the specified class and application. The item data should be sent in the request body.

PUT /apps/:app/:class/:id

Update an existing item with the specified ID in the specified class and application. The updated data should be sent in the request body.

DELETE /apps/:app/:class/:id

Delete the item with the specified ID in the specified class and application.

Application Data Types

The following data types are used in the application endpoints:

ApplicationDefinition

Represents a simple application configuration that exposes REST CRUD operations for a given graph and set of classes or shapes.

name (string): The name of the application.
store (string): The name of the store where the application resides.
graph (string): The graph associated with the application.
schema (string): The schema used by the application.
classes (array of objects): Each class specifies a type URI and a resource name used in URLs.
aggregates (array of AggregateDefinition, optional): Defines how to build aggregate instances from event classes.
access (string): Specifies access level, either "read" or "write" (write implies read).

Example

Below is an example of an ApplicationDefinition:

{
  "name": "exampleApp",
  "store": "exampleStore",
  "graph": "exampleGraph",
  "schema": "exampleSchema",
  "classes": [
    {"type": "Person", "resource": "people"},
    {"type": "Organization", "resource": "orgs"}
  ],
  "aggregates": [
    {
      "name": "exampleAggregate",
      "root": {
        "class": "Event",
        "identity_property": "eventId",
        "properties": [
          { "name": "timestamp", "property": "eventTime" },
          { "name": "location", "property": "eventLocation" }
        ]
      },
      "classes": [
        {
          "class": "Transaction",
          "identity_property": "transactionId",
          "properties": [
            { "name": "amount", "property": "transactionAmount" }
          ]
        }
      ],
      "all_properties": true
    }
  ],
  "access": "write"
}

This example defines an application named exampleApp that operates on the exampleGraph in the exampleStore. It includes two classes (Person and Organization) and one aggregate definition (exampleAggregate), which aggregates data from Event and Transaction classes.

AggregateDefinition

Describes how to build an aggregate instance from a set of event classes.

name (string): The name of the aggregate.
root (AggregateClass): The root class of the aggregate.
classes (array of AggregateClass, optional): Additional classes used to build the aggregate.
all_properties (boolean, optional): Whether to include all properties in the aggregate.

AggregateClass

Specifies an event class used to build an aggregate and the property on that event that identifies the aggregate instance it applies to.

class (string): The name of the event class.
identity_property (string): The property that identifies the aggregate instance.
properties (array of AggregateProperty, optional): Maps event properties to aggregate property names.

AggregateProperty

Maps an event property to the aggregate property name.

name (string): The name of the aggregate property.
property (string): The event property to map.

These data types are used in the request and response payloads for the application endpoints.

Admin Endpoints

GET /admin/version

Retrieve the current version of the server.

POST /admin/restart

Restart the server.

POST /admin/upgrade

Upgrade the server to the specified version. Provide the version as a query parameter version.

Agent Endpoints

Manage agent configuration and access conversation logs.

Method	Endpoint	Description
GET	`/agents/:id/config`	Retrieve the active configuration for agent `:id`.
PUT	`/agents/:id/config`	Replace the configuration for agent `:id`. Provide YAML in the request body.
GET	`/agents/:id/logs`	List available log cycles. Optional `limit` and `after` query parameters page the results.
GET	`/agents/:id/logs/:cycle`	Retrieve the transcript for the specified `cycle`.

The configuration endpoint returns a JSON object describing how the agent runs:

{
    "ollama_base_url": "http://localhost:11434",
    "model": "qwen2.5-coder:7b",
    "tick": "10s",
    "max_steps": 4,
    "mcp_ws_url": "ws://localhost:8081",
    "system_prompt": "You are GraphLake.",
    "user_prompt": "Answer questions about the graph."
  }

Field meanings:

ollama_base_url: Base URL for the Ollama server.
model: Name of the LLM to invoke.
tick: Interval between agent runs (e.g., 10s).
max_steps: Maximum number of reasoning steps per run.
mcp_ws_url: WebSocket URL for the MCP server.
system_prompt: System instructions prepended to conversations.
user_prompt: Default user prompt when triggering the agent.

The same fields can be supplied as YAML in the body of the PUT /agents/:id/config request.

API Data Types

The following request and response payloads are referenced throughout the API documentation.

QueryRequest

Represents the request body for executing a query.

query (string): The query string to execute.
type (string): The query type (e.g., "SPARQL", "Cypher").

QueryResponse

Represents the response for a query execution.

results (array): An array of query results, each represented as a JSON object.

GraphImportRequest

Defines the request body or parameters for importing data into a graph.

location (string, optional): The file or folder name in the import directory.
graph (string): The name of the graph to import data into.

StoreCreationResponse

Represents the response for creating a new store.

message (string): A confirmation message (e.g., "Store created").

UserCreationRequest

Represents the request body for creating a new user.

username (string): The username of the new user.
password (string): The password for the new user.
public_key (string, optional): The public key for the user.

PasswordAuthRequest

Defines the request body for password-based authentication.

username (string): The username of the user.
password (string): The password of the user.

JWTAuthRequest

Defines the request body for JWT-based authentication.

token (string): The JWT token to verify.

TokenResponse

Represents the response containing a JWT token.

token (string): The JWT token.

Rule

Represents a security rule for a user.

resource (string): The resource pattern the rule applies to.
permission (string): The permission level (e.g., "read", "write").

Talk to Your Data

GraphLake includes a Talk workflow that lets you describe an information need in plain language and receive the equivalent SPARQL query and result set. The workflow pairs schema knowledge with an LLM so the generated query respects your graph model.

Configure the OpenAI Endpoint

Gather credentials. Create an API key in the OpenAI dashboard and choose the model (for example gpt-4o-mini) that will translate questions into SPARQL.

Inject the key at startup. Before starting GraphLake, export the credentials so they are available to the process:

export OPENAI_API_KEY=<your-key>
export OPENAI_MODEL=gpt-4o-mini
export OPENAI_BASE_URL=http://localhost:8080/v1 # optional override for compatible endpoints

Start GraphLake. When the server starts it now checks for the OPENAI_API_KEY (and optional OPENAI_MODEL or OPENAI_BASE_URL) environment variables and automatically enables the OpenAI-backed Talk integration. No code changes are required. The default client points at https://api.openai.com/v1; setting OPENAI_BASE_URL lets you route requests to a local or proxy OpenAI-compatible deployment.

Restart GraphLake after setting the environment variables. Every /stores/:store/talk request and the UI now send prompts to OpenAI instead of the built-in mock model.

Upload a Schema

The Talk workflow performs best when it can reference SHACL shapes that describe the vocabulary in your graphs. Import the schema into its own graph so the LLM receives structured context:

curl -X POST \
  -H "Content-Type: text/turtle" \
  --data-binary @schema.ttl \
  "http://localhost:7642/stores/demo/import?graph=https://example.org/shapes"

Schema graphs live alongside regular data graphs, so you can version them with branches or keep multiple shape sets for different applications.

Annotate the Schema

Enriching the SHACL shapes with human-readable annotations dramatically improves the queries generated by the LLM. Add rdfs:label, rdfs:comment, or sh:description triples that explain how the classes and properties should be used:

ex:PersonShape a sh:NodeShape ;
  sh:targetClass ex:Person ;
  rdfs:label "Person" ;
  sh:description "Customer or prospect record" ;
  sh:property [
    sh:path ex:email ;
    rdfs:label "Email address" ;
    sh:description "Primary contact e-mail for the person" ;
    sh:datatype xsd:string ;
  ] .

These annotations become part of the prompt sent to the model, helping it choose the correct predicates and filters when building SPARQL.

Use the Talk UI

Open the page. Browse to /talk.html (or choose Talk in the left navigation of the GraphLake UI). The form automatically reuses the most recent store and branch selections.
Choose context. Enter the store name, optional branch, and the schema graph you imported earlier (for example https://example.org/shapes). Supplying a schema graph makes the LLM prompt more precise.
Ask a question. Type a natural-language request such as “List active customers created this month” and press Ask. The UI posts to /stores/:store/talk using the OpenAI-backed generator.
Review the output. The generated SPARQL appears in the Query panel and the execution results display beneath it. Errors returned by the API (for example schema misconfiguration) are rendered in the results panel so you can adjust the prompt or fix the data.
Iterate. Refine the question, adjust annotations, or point at a different branch or schema graph until the query matches your intent. Once you are satisfied you can copy the SPARQL statement into scripts or stored queries.

Because the Talk UI shares browser state with the rest of the console, any stores, branches, or schema graphs selected on other pages stay pre-filled, streamlining exploratory workflows.

Agents

GraphLake can run background agents defined in a configuration file.

LLM Agent

The LLM agent watches the graph for trigger events, asks a large language model to reason about next steps, and then persists the resulting plan back into GraphLake. This section walks through a complete example that you can copy and adapt for your own automations.

Sample data to load

Create a small Turtle file that models an upcoming task and a deadline resource. The task description gives the LLM enough context to plan additional actions.

@prefix ex: <https://example.org/project/> .
@prefix schema: <https://schema.org/> .

ex:Task123 a schema:Action ;
  schema:name "Prepare quarterly review" ;
  schema:description "Compile financial metrics, gather customer feedback, and rehearse the executive presentation." ;
  ex:deadline ex:Deadline123 .

ex:Deadline123 a ex:TaskDeadline ;
  schema:dueDate "2025-03-15"^^schema:Date ;
  schema:description "Quarterly review deck must be ready for the March 15 board meeting." .

Load the data into a working graph with the import endpoint:

curl -X POST \
  "http://localhost:7642/stores/myStore/import?graph=https://example.org/project" \
  -H "Content-Type: text/turtle" \
  --data-binary @task.ttl

Configure the LLM agent

Define an agent configuration YAML file. The llm agent type listens to a SPARQL trigger_query. When the query returns bindings, the agent builds a prompt template using the bindings and calls the configured language model provider. The completion is then written back into the graph with an update_template.

type: llm
tick: 30s
graph: https://example.org/project
trigger_query: |
  PREFIX ex: <https://example.org/project/>
  PREFIX schema: <https://schema.org/>
  SELECT ?task ?taskName ?taskDescription ?deadline
  WHERE {
    ?task a schema:Action ;
          schema:name ?taskName ;
          schema:description ?taskDescription ;
          ex:deadline ?deadline .
    ?deadline a ex:TaskDeadline .
    FILTER NOT EXISTS { ?task ex:generatedPlan ?plan }
  }
llm:
  provider: openai
  model: gpt-4.1
  api_key_env: OPENAI_API_KEY
prompt_template: |
  You are a planning assistant. The task "{{taskName}}" has the description:
  {{taskDescription}}

  The deadline resource is {{deadline}}. Suggest up to three concrete follow-up events the team should
  schedule in advance so they are ready by the due date. Respond in JSON with an array named "events" and
  include fields "title", "purpose", and "leadTimeDays".
update_template: |
  PREFIX ex: <https://example.org/project/>
  INSERT {
    GRAPH <https://example.org/project> {
      {{#each events}}
        _:event ex:belongsTo {{task}} ;
                ex:title "{{title}}" ;
                ex:purpose "{{purpose}}" ;
                ex:leadTimeDays {{leadTimeDays}} .
      {{/each}}
      {{task}} ex:generatedPlan _:plan .
      _:plan ex:rawResponse "{{raw_response}}" .
    }
  }
  WHERE {}

The configuration instructs the agent to poll the graph every 30 seconds, collect tasks with deadlines, and persist the LLM response as a set of event nodes connected to the task. The raw_response field is optional but useful for auditing.

Initialise and run the agent

Export the configuration. Save the YAML to agents/task-planner.yaml and ensure the file is available on the server running GraphLake.
Configure credentials. Set the environment variable referenced in api_key_env (for example export OPENAI_API_KEY=sk-...).

Register the agent. Call the agent API to add or reload the configuration.

curl -X POST \
  "http://localhost:7642/stores/myStore/agents" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "task-planner",
    "configPath": "agents/task-planner.yaml"
  }'

Start processing. Agents run inside the GraphLake application process. Restart the service or call the /stores/:store/agents/:name/start endpoint if the agent is not already active.

Monitor agent activity

Use the agent endpoints to inspect the current status and recent actions:

# Check agent state and last tick
curl http://localhost:7642/stores/myStore/agents/task-planner

# Retrieve execution history (most recent 10 runs)
curl "http://localhost:7642/stores/myStore/agents/task-planner/history?limit=10"

Logs also include each LLM invocation with the rendered prompt and model response. Tail the GraphLake server logs to follow the agent in real time:

docker logs -f graphlake-server | grep "task-planner"

View graph updates

After the LLM generates a plan, query the graph to review the generated events and raw output:

PREFIX ex: <https://example.org/project/>
PREFIX schema: <https://schema.org/>

SELECT ?taskName ?eventTitle ?purpose ?leadTime
WHERE {
  ?task schema:name ?taskName ;
        ex:generatedPlan ?plan .
  ?event ex:belongsTo ?task ;
         ex:title ?eventTitle ;
         ex:purpose ?purpose ;
         ex:leadTimeDays ?leadTime .
}
ORDER BY ?taskName ?eventTitle

The query reveals the concrete follow-up events suggested by the LLM. Because the agent records the raw response on ?plan, you can also retrieve it for audit purposes:

PREFIX ex: <https://example.org/project/>
SELECT ?task ?raw WHERE {
  ?task ex:generatedPlan ?plan .
  ?plan ex:rawResponse ?raw .
}

Combine these queries with dashboard visualisations or alerts to build a complete feedback loop between the graph, the agent, and your operational systems.

SPARQL Agent

Periodically executes a SELECT trigger_query and runs a templated update_query for each result. Placeholders like {{var}} are replaced with bindings from the trigger.

type: sparql
tick: 10s
graph: urn:graph1
trigger_query: |
  SELECT ?s WHERE { ?s <urn:status> "open" }
update_query: |
  WITH <urn:graph1>
  DELETE { {{s}} <urn:status> "open" . }
  INSERT { {{s}} <urn:status> "done" . }
  WHERE {}

Security

GraphLake uses signed JWT to allow access to the API. Tokens are fetched from the API by authenticating with either a signed JWT token using a local private key or user name and password. Users are managed on the server to give read, write and owner access to stores and graphs.

When a security manager is enabled, certain endpoints require a valid JWT token for authorization. The server applies ACL checks to ensure that a user can only access the stores and graphs permitted by their assigned rules. If you receive a 401 (Unauthorized) or 403 (Forbidden) response, check that:

Your JWT is valid and has not expired.
You have the necessary permissions (read/write) for the requested resource.

The developer edition runs unsecured.

Monitoring

GraphLake writes structured logs to StdErr which when run in docker goes into the docker logs. We recommend shipping these logs to your favourite log management system and configuring any notifications you require.

VSCode client

To enable a great developer experience we provide a VSCode extension for managing stores and graphs, running queries, and user managemment. All operations can also be done from CURL or programtically over http in any language.

Installation

SHACL Support

GraphLake includes native SHACL validation so data can be checked against a schema before use.

Uploading Shapes to a Graph

SHACL shapes are stored in a regular named graph. Upload the shapes using the import endpoint and supply the graph name where the shapes should reside:

curl -X POST \
  "http://localhost:7642/stores/myStore/import?graph=https://example.org/shapes" \
  -H "Content-Type: text/turtle" \
  --data-binary @shapes.ttl

Invoking Validation

To validate data, post the schema graph and data graph to the /stores/:store/validate endpoint. Optionally include a branch query parameter to validate a non-main branch.

curl -X POST "http://localhost:7642/stores/myStore/validate" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": "https://example.org/shapes",
    "graph": "https://example.org/data"
  }'

The response is a SHACL validation report listing conforming status and any violations.

Common Shape Examples

@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix ex: <https://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass ex:Person ;
  sh:property [
    sh:path ex:name ;
    sh:datatype xsd:string ;
    sh:minCount 1 ;
  ] ;
  sh:property [
    sh:path ex:age ;
    sh:datatype xsd:integer ;
    sh:minInclusive 0 ;
  ] .

ex:EmailShape
  a sh:NodeShape ;
  sh:targetClass ex:Person ;
  sh:property [
    sh:path ex:email ;
    sh:pattern "^[^@]+@[^@]+$" ;
  ] .

ex:KnowsShape
  a sh:NodeShape ;
  sh:targetClass ex:Person ;
  sh:property [
    sh:path ex:knows ;
    sh:class ex:Person ;
  ] .

ex:PersonShape ensures every ex:Person has a name string and non‑negative age.
ex:EmailShape validates that email values match a simple address pattern.
ex:KnowsShape requires objects of ex:knows to also be ex:Person.

Multi-Node Operation

Multi-node operation allows GraphLake to scale horizontally by distributing workloads across multiple nodes. This setup is particularly beneficial for large-scale deployments where high availability, fault tolerance, and load balancing are critical. By leveraging multiple nodes, organizations can ensure that their GraphLake instance remains responsive even under heavy workloads.

Benefits of Multi-Node Operation

Scalability: Easily add more nodes to handle increased data and query loads.
High Availability: If one node fails, others can continue to serve requests, minimizing downtime.
Load Balancing: Distribute queries and data processing tasks across nodes to optimize performance.

Configuration Steps

Enable Multi-Node Mode: Set the environment variable GRAPHLAKE_MULTI_NODE to true or use the --multinode flag when starting the application.
Configure Backend Storage: Ensure that all nodes have access to a shared storage backend (e.g., S3 or Azure Blob Storage) to maintain data consistency.
Synchronize Configuration: Use the same configuration file or environment variables across all nodes to ensure uniform behavior.
Set Up a Load Balancer: Deploy a load balancer in front of the nodes to distribute incoming requests evenly.
Monitor and Scale: Use monitoring tools to track node performance and add or remove nodes as needed to meet demand.

Example Command

To start a node in multi-node mode using Docker:

docker run -d -p 7642:7642 -v /tmp/graphlake-store:/store \
  -e GRAPHLAKE_MULTI_NODE=true \
  dataplatformsolutions/graphlake:latest --storepath /store --port 7642

For a full walkthrough, see Multi-Node Tutorial.

Gateway for Elastic Nodes

The gateway package exposes a lightweight front end capable of starting GraphLake nodes on demand. A Provider interface abstracts the underlying infrastructure so implementations can target Kubernetes, AWS, Azure, or other platforms.

Provider Configuration

LocalDockerProvider: Requires Docker image name and S3 credentials (GRAPHLAKE_S3_BUCKET, GRAPHLAKE_AWS_ACCESS_KEY, GRAPHLAKE_AWS_SECRET_KEY, and GRAPHLAKE_AWS_REGION).
LocalBinaryProvider: Requires the path to an existing GraphLake binary and a base directory for temporary stores.
InProcessProvider: Requires a base directory for temporary stores; starts the node in the current process, which is useful for tests.
KubernetesProvider: Requires Docker image and optional namespace prefix. Uses in-cluster configuration or KUBECONFIG to connect to the cluster.
AWSProvider: Requires AMI ID, instance type, and the GraphLake Docker image. Uses default AWS SDK credentials or the supplied key and secret.
AzureProvider: Requires Azure subscription ID, resource group, location, and Docker image. Authentication relies on the Azure SDK’s DefaultAzureCredential.

Gateway Endpoints

The gateway proxies requests to GraphLake nodes and provides admin endpoints for user and token management:

POST /workloads/{name}/start
POST /workloads/{name}/stop
GET|POST|... /workloads/{name}/

POST /admin/authenticate/password
POST /admin/authenticate/jwt
POST /admin/users
DELETE /admin/users/{username}
POST /admin/users/{username}/keypair
POST /admin/users/{username}/rules

Monitoring and Scaling

The gateway manages workloads and nodes dynamically:

Node Capacity: Defaults to 100 concurrent requests per node, configurable via GATEWAY_NODE_CAPACITY.
Idle Timeout: Nodes are shut down after being idle for the period configured by GATEWAY_IDLE_TIMEOUT (default: 10 minutes).
Keep One Node: Setting GATEWAY_KEEP_ONE_NODE to true keeps a single node running even when idle.
Scale Down Delay: Extra nodes remain active for the duration set by GATEWAY_SCALE_DOWN_DELAY after request counts drop below capacity (default: 1 minute).

Example Commands

Run the gateway service using the cmd/gateway binary. It exposes REST endpoints to start and stop workloads and proxies GraphLake API calls:

POST /workloads/{name}/start
POST /workloads/{name}/stop
GET|POST|... /workloads/{name}/

To list active workloads and their nodes:

GET /workloads                     # list active workload names
GET /workloads/{name}/nodes        # list nodes for a workload with health info
GET /workloads/{name}/nodes/{id}/health

Support

If you have issues with GraphLake then please reach out to us on discord, file an issue on public github repository, or send an email to "contact @ dataplatformsolutions.com".