Version: 0.1.0
Date: January 18, 2025
GraphLake is a graph database that supports a unified OneGraph model, seamlessly combining property graphs and RDF. This means you do not need to choose between the two graph types; it supports both simultaneously, offering flexibility and versatility for a wide range of use cases.
Designed for scalability and performance, GraphLake supports both large analytical workloads and Online Transaction Processing (OLTP) scenarios. Its architecture draws inspiration from the MPP style of systems like Apache Iceberg and DeltaLake. However, GraphLake introduces automatic data partitioning, with representation and filtering mechanisms specifically optimized for graph data structures.
GraphLake is engineered to work efficiently with files stored locally on high-performance NVMe or SSD drives or in cloud storage solutions such as Amazon S3 (and S3-compatible options like MinIO) or Azure Blob Storage. The system allows independent scaling of compute and storage resources to match workload requirements. For instance, you can store large volumes of data cost-effectively while using a small compute instance, or opt for a configuration with smaller datasets and larger compute resources to support high-performance querying.
Leveraging parallel processing and memory, GraphLake delivers impressive performance, with the ability to scale hardware resources for even greater efficiency.
GraphLake is not offered as a managed service. Instead, it is designed to be deployed within your cloud environment or run on your hardware, giving you complete control over your setup and infrastructure. We provide a developer edition that is free to use indefinitely, making it easy to get started with GraphLake and explore its capabilities.
There are 4 versions of GraphLake
GraphLake developer edition is available now, commercial releases will be coming soon. If you would like to know more or start using GraphLake commercially, please email "contact @ dataplatformsolutions.com"
GraphLake is available as a docker image. The docker image is available here on docker hub.
Either download and unpack the binary from the .zip package and place it on your path, or pull the docker image:
docker pull dataplatformsolutions/graphlake:latest
To support deploying GraphLake into your cloud we have provided example terraform scripts that can be used and adapted to a specific setup. Deployment of GraphLake through the AWS and Azure marketplace is coming soon.
AWS terraform on https://github.com/dataplatformsolutions/graphlake-deploy
Available in the AWS marketplace - coming soon!
Azure terraform on https://github.com/dataplatformsolutions/graphlake-deploy
Available in the Azure marketplace - coming soon!
Digital Ocean terraform on https://github.com/dataplatformsolutions/graphlake-deploy
GraphLake supports the following startup config environment variables:
Environment Variable | Description |
---|---|
GRAPHLAKE_BACKEND_TYPE | The storage backend to use (local, s3, azure) - only local is supported at the moment |
GRAPHLAKE_STORE_PATH | The path to store data when using local backend |
GRAPHLAKE_PORT | The port to listen on |
GRAPHLAKE_LOG_LEVEL | The log level to use - debug or info |
GRAPHLAKE_CACHE_SIZE | Size of datafile cache |
GRAPHLAKE_IMPORT_LOCATION | Path relative to store path. Used to locate files for import |
GRAPHLAKE_ADMIN_PASSWORD | Password for user admin. Used for Business edition and up to secure GraphLake. |
To run GraphLake in a Docker container as a detached process, passing the environment variables and mapping a local folder to the configured data path, you would run:
docker run -d \
-e GRAPHLAKE_BACKEND_TYPE=local \
-e GRAPHLAKE_STORE_PATH=/data \
-e GRAPHLAKE_IMPORT_LOCATION=import \
-v /path/to/local/data:/data \
dataplatformsolutions/graphlake:latest
POST /stores/:store/query
Execute a query in the specified :store
. GraphLake supports three different query langauges: SPARQL
(core subset), SemanticCypher, and GraphLake Javascript. See the sections in the query section for details on each
langauge. Use the following content types to specify which query you are using: application/sparql,
application/x-graphlake-query-semanticcypher, application/x-graphlake-query-javascript.
Method | POST |
---|---|
Endpoint | /stores/:store/query |
Description | Runs a query on a given store. |
Request Body |
A string containing the query. The format depends on the underlying query engine.
|
Response |
JSON result of the query. For example:
Note the query result structure can differ based on the query type. |
POST /stores/:store/graphs/:graph/import
Import data (in N-Triples format, or CSV) into a specified :graph
. To import csv as nodes use the
content type
application/x-graphlake-node+csvand for edges use
application/x-graphlake-node+csv.
Method | POST | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Endpoint | /stores/:store/graphs/:graph/import |
||||||||||||||||||||||||||||
Description | Imports data into the given graph. The data can be sent via request body or by specifying a file parameter. | ||||||||||||||||||||||||||||
Query Parameters |
|
||||||||||||||||||||||||||||
Request Body |
If
Use the appropriate content type to allow the server to know what to import. CSV Structure Documentation Nodes CSVThe CSV file for nodes should define each node with its unique identifier, label, and properties. Each row represents a single node. Required Columns:
The CSV file for edges should define the relationships between nodes. Each row represents a single edge. Required Columns:
|
||||||||||||||||||||||||||||
Response |
Returns |
POST /stores/:store/graphs/:graph
Create an empty graph in the specified :store
.
Method | POST |
---|---|
Endpoint | /stores/:store/graphs/:graph |
Description | Creates a new graph with the provided name. |
Request Body | No body is required. The :graph path parameter is the graph name. |
Response |
|
GET /stores/:store/graphs
Retrieve a list of available graphs in the specified :store
.
Method | GET |
---|---|
Endpoint | /stores/:store/graphs |
Description | Lists all graph names in the store. |
Response |
An array of strings, each representing a graph name.
|
DELETE /stores/:store/graphs/:graph
Delete a specific graph within a store.
Method | DELETE |
---|---|
Endpoint | /stores/:store/graphs/:graph |
Description | Deletes the specified graph. |
Request Body | No body is required. The :graph path parameter identifies the graph. |
Response |
|
POST /stores/:store
Create a new store.
Method | POST |
---|---|
Endpoint | /stores/:store |
Description | Creates a new store using the provided name. |
Request Body | No request body is required. :store is the store name. |
Response |
|
GET /stores
Retrieve a list of all stores. If security is enabled, only stores accessible to the current user are returned.
Method | GET |
---|---|
Endpoint | /stores |
Description | Lists all stores. |
Response |
An array of strings, each representing a store name.
|
DELETE /stores/:store
Delete a specified store. This operation is irreversible.
Method | DELETE |
---|---|
Endpoint | /stores/:store |
Description | Deletes the specified store. |
Response |
|
POST /admin/users
Create a new user in the system.
Method | POST |
---|---|
Endpoint | /admin/users |
Description | Add a new user with a username, password, and optional public key. |
Request Body |
JSON object containing user information:
|
Response |
|
DELETE /admin/users/:username
Delete an existing user by username.
Method | DELETE |
---|---|
Endpoint | /admin/users/:username |
Description | Deletes the specified user from the system. |
Response |
|
POST /admin/users/:username/keypair
Generate a new private/public key pair for the specified user.
Method | POST |
---|---|
Endpoint | /admin/users/:username/keypair |
Description | Generates a new key pair and stores the public key for the user. The private key is returned in the response. |
Response |
Returns a JSON object containing the newly generated key pair:
|
POST /admin/authenticate/password
Obtain a JWT token by providing username and password.
Method | POST |
---|---|
Endpoint | /admin/authenticate/password |
Description | Authenticates a user with password and returns a JWT token if successful. |
Request Body |
JSON object with
|
Response |
|
POST /admin/authenticate/jwt
Verify an existing JWT and obtain a renewed token.
Method | POST |
---|---|
Endpoint | /admin/authenticate/jwt |
Description | Verifies a JWT and returns a new JWT if valid. |
Request Body |
JSON object with a
|
Response |
|
POST /admin/users/:username/rules
Define or update the security rules for a specified user.
Method | POST |
---|---|
Endpoint | /admin/users/:username/rules |
Description | Sets security/ACL rules for a user. |
Request Body |
An array of rules. Each rule typically contains resource patterns and permissions (e.g. "read", "write").
Exact structure may vary based on the
|
Response |
|
# Example: Create a store and a graph, then import data.
# 1. Create a new store
curl -X POST http://localhost:8080/stores/myNewStore
# 2. Create a new graph in that store
curl -X POST http://localhost:8080/stores/myNewStore/graphs/myGraph
# 3. Import N-Triples data from a file
curl -X POST http://localhost:8080/stores/myNewStore/graphs/myGraph/import \
-H "Content-Type: text/plain" \
--data-binary @path/to/your/file.nt
# 4. Query the data
curl -X POST http://localhost:8080/stores/myNewStore/query \
-H "Content-Type: text/plain" \
--data "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"
GraphLake supports a subset of SPARQL features for querying RDF data. This document outlines the supported features, including query forms, patterns, and functions.
SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?object.
}
ASK WHERE {
?subject ?predicate ?object.
}
SELECT ?subject ?object
WHERE {
?subject ?object.
}
SELECT ?subject ?object
WHERE {
{
?subject ?object.
}
UNION
{
?subject ?object.
}
}
SELECT ?subject ?object ?optionalObject
WHERE {
?subject ?object.
OPTIONAL { ?subject ?optionalObject. }
}
SELECT ?subject ?object
WHERE {
?subject ?object.
FILTER (?object > 10)
}
SELECT ?subject
WHERE {
?subject ?object.
FILTER (BOUND(?object))
}
SELECT ?subject
WHERE {
?subject ?predicate ?object.
FILTER (isIRI(?subject))
}
SELECT ?subject
WHERE {
?subject ?predicate ?object.
FILTER (isLiteral(?object))
}
SELECT ?subject
WHERE {
?subject ?predicate ?object.
FILTER (isBlank(?subject))
}
SELECT ?subject (STR(?object) AS ?objectStr)
WHERE {
?subject ?predicate ?object.
}
SELECT ?subject (LANG(?object) AS ?lang)
WHERE {
?subject ?predicate ?object.
}
SELECT ?subject (DATATYPE(?object) AS ?datatype)
WHERE {
?subject ?predicate ?object.
}
SELECT ?subject
WHERE {
?subject ?predicate ?object.
FILTER (CONTAINS(STR(?object), "example"))
}
SELECT ?subject
WHERE {
?subject ?predicate ?object.
FILTER (STRSTARTS(STR(?object), "http://"))
}
SELECT ?subject
WHERE {
?subject ?predicate ?object.
FILTER (STRENDS(STR(?object), ".org"))
}
GraphLake's SPARQL support includes essential query forms, patterns, and functions to perform effective graph data queries. Use the provided examples to construct and execute your SPARQL queries.
This is still being developed but in essence it is OpenCypher but we introduce a namespace directive and then change the form of variables and identifiers, to be gra?things:Person rather than cypher with gra:Person. There are also some built in properties like __id, which is the subject URI of a node.
GraphLake provides a JavaScript-based query language to interact with the graph data. This language allows you to perform various operations such as matching triples, committing transactions, and writing query results.
GraphLake executes JS and inserts into the runtime two objects,
_contextand
_writer. The context object is used to update and query the store. The writer object allows to data to be sent back to the calling application.
_context.matchTriples(subject, predicate, object, datatype, isLiteral, graphs)
Description: Matches triples in the specified graphs.
Parameters:
subject
(string): The subject of the triple.predicate
(string): The predicate of the triple.object
(string): The object of the triple.datatype
(string): The datatype of the object.isLiteral
(boolean): Whether the object is a literal.graphs
(array of strings): The graphs to search in.Returns: An iterator for the matched triples.
_context.assertTriple(subject, predicate, object, datatype, isLiteral, graph)
Description: Asserts a new triple in the specified graph.
Parameters:
subject
(string): The subject of the triple.predicate
(string): The predicate of the triple.object
(string): The object of the triple.datatype
(string): The datatype of the object.isLiteral
(boolean): Whether the object is a literal.graph
(string): The graph to insert into._context.commit()
Description: Commits the current transaction.
Returns: true or false if the transaction was successful.
_context.deleteTriple(subject, predicate, object, datatype, isLiteral, graph)
Description: Deletes a triple from the specified graph.
Parameters:
subject
(string): The subject of the triple.predicate
(string): The predicate of the triple.object
(string): The object of the triple.datatype
(string): The datatype of the object.isLiteral
(boolean): Whether the object is a literal.graph
(string): The graph to delete from._writer.writeHeader(headers)
Description: Writes the header for the query result.
Parameters:
headers
(array of strings): The header columns.Returns: Undefined.
_writer.writeRow(row)
Description: Writes a row to the query result.
Parameters:
row
(array of any): The row data.Returns: Undefined.
_writer.writeHeader(["Subject", "Predicate", "Object"]);
let triplesIter = _context.matchTriples("http://example.org/subject", "", "", "", true, ["graph1"]);
while (true) {
let triple = triplesIter.next();
if (triple == null) {
break;
}
_writer.writeRow([triple.subject, triple.predicate, triple.object]);
}
Explanation: This script matches all triples with the subject
http://example.org/subject
in graph1
and writes the results to the output.
// Delete an existing triple
_context.deleteTriple("http://example.org/subject", "http://example.org/predicate", "http://example.org/object", "", false, "graph1");
// Add two new triples
_context.assertTriple("http://example.org/subject1", "http://example.org/predicate1", "http://example.org/object1", "", false, "graph1");
_context.assertTriple("http://example.org/subject2", "http://example.org/predicate2", "http://example.org/object2", "", false, "graph1");
// Commit the transaction
_context.commit();
Explanation: This script commits the current transaction.
Example 3: Writing Headers and Rows
_writer.writeHeader(["Id", "Name"]);
let triplesIter = _context.matchTriples("http://example.org/person", "http://example.org/name", "", "", true, ["graph1"]);
while (true) {
let triple = triplesIter.next();
if (triple == null) {
break;
}
_writer.writeRow([triple.subject, triple.object]);
}
Explanation: This script writes a header with columns "Id" and "Name", matches triples with the
subject http://example.org/person
and predicate http://example.org/name
in
graph1
, and writes the results to the output.
_writer.writeHeader(["Subject", "Predicate", "Object"]);
let graphs = ["graph1", "graph2"];
let triplesIter = _context.matchTriples("http://example.org/subject", "", "", "", true, graphs);
while (true) {
let triple = triplesIter.next();
if (triple == null) {
break;
}
_writer.writeRow([triple.subject, triple.predicate, triple.object]);
}
Explanation: This script matches all triples with the subject
http://example.org/subject
in both graph1
and graph2
and writes the results
to the output.
GraphLake uses signed JWT to allow access to the API. Tokens are fetched from the API by authenticating with either a signed JWT token using a local private key or user name and password. Users are managed on the server to give read, write and owner access to stores and graphs.
When a security manager is enabled, certain endpoints require a valid JWT token for authorization. The server applies ACL checks to ensure that a user can only access the stores and graphs permitted by their assigned rules. If you receive a 401 (Unauthorized) or 403 (Forbidden) response, check that:
The developer edition runs unsecured.
GraphLake writes structured logs to StdErr which when run in docker goes into the docker logs. We recommend shipping these logs to your favourite log management system and configuring any notifications you require.
To enable a great developer experience we provide a VSCode extension for managing stores and graphs, running queries, and user managemment. All operations can also be done from CURL or programtically over http in any language.
If you have issues with GraphLake then please reach out to us on discord, file an issue on public github repository, or send an email to "contact @ dataplatformsolutions.com".