Skip to content

Using Gremlin in Gaffer

It is possible to use Gremlin as an alternative querying language in Gaffer. To make Gremlin available however, there are some additional steps that need to be taken to ensure it is setup correctly.

Overview

Gremlin serves as a query layer for a graph that implements the Tinkerpop graph structure. As of v2.3.0 Gremlin is in the Gaffer REST API which provides a Websocket based traversal source similar to using a normal Gremlin server. This is the recommended approach and the easiest way to start using Gremlin on Gaffer.

If you wish to connect via the Java API, you can utilise the underlying 'GafferPop' library that can be utilised to enable Gremlin queries. This library can be included via maven in any project using the following dependency definition:

<dependency>
    <groupId>uk.gov.gchq.gaffer</groupId>
    <artifactId>tinkerpop</artifactId>
    <version>${gaffer.version}</version>
</dependency>

Both methods (REST API and Java API) utilise the same library that allows Tinkerpop to talk to a Gaffer graph. To actually spawn a Gremlin query a reference to a GraphTraversalSource is required, the following sections outline how to obtain this reference using the REST API.

Connecting Gremlin

As mentioned previously the recommended way to use Gremlin queries is via the Websocket in the Gaffer REST API. To do this you will need to provide a config file that sets up the Gaffer Tinkerpop library (a.k.a 'GafferPop'). The file can either be added to /gaffer/gafferpop.properties in the container, or at a custom path by setting the gaffer.gafferpop.properties key in the store.properties file. This file can be blank but it is still recommended to setup some default values.

Tip

Please see the section below on how to configure the GafferPop properties file.

Once the GafferPop properties file has been added, if you start the REST API a Gremlin websocket will be available at localhost:8080/gremlin by default. To connect to this socket you must use the GraphSON v3 format. Most standard Gremlin tools already default to this however, if connecting using gremlinpython you must set it in the driver connection like:

from gremlin_python.driver.serializer import GraphSONSerializersV3d0

g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8080/gremlin', 'g', message_serializer=GraphSONSerializersV3d0()))

Configuring the GafferPop Library

The gafferpop.properties, file is the configuration for GafferPop. If using the REST API there is no mandatory properties you need to set since you already will have configured the Graph in the existing store.properties file. However, adding some default values in for operation modifiers, such as a limit for GetAllElement operations, is good practice.

# Default operation config
gaffer.elements.getalllimit=5000
gaffer.elements.hasstepfilterstage=PRE_AGGREGATION

A full breakdown of the available properties is as follows:

Note

Many of these are for standalone GafferPop Graphs so may be ignored if using the REST API.

Property Key Description Used in REST API
gremlin.graph The Tinkerpop graph class we should use for construction. No
gaffer.graphId The graph ID of the Tinkerpop graph. No
gaffer.storeproperties The path to the store properties file. No
gaffer.schemas The path to the directory containing the graph schema files. No
gaffer.userId The default user ID for the Tinkerpop graph. No (User is always set via the UserFactory.)
gaffer.dataAuths The default data auths for the user to specify what operations can be performed No
gaffer.rest.timeout The timeout for gremlin queries submitted to the REST API in ms. Default is 2 mins if not specified. Yes
gaffer.operation.options Default Operation options in the form key:value (this can be overridden per query see here) Yes
gaffer.elements.getalllimit The default limit for unseeded queries e.g. g.V(). Yes
gaffer.elements.hasstepfilterstage The default stage to apply any has() steps e.g. PRE_AGGREGATION Yes