Connecting Gremlin to Gaffer
It is possible to use Gremlin as an alternative querying language in Gaffer. To make Gremlin available however, there are some additional steps that need to be taken to connect to a Gaffer graph via this interface.
Overview
Gremlin serves as a query layer for a graph that implements the Tinkerpop graph structure. As of v2.1.0 Gaffer has made available a library that can be utilised to enable Gremlin queries. This library can be included via maven in any project using the following dependency definition:
<dependency>
<groupId>uk.gov.gchq.gaffer</groupId>
<artifactId>tinkerpop</artifactId>
<version>${gaffer.version}</version>
</dependency>
The library contains the graph implementation that allows Tinkerpop to talk to a
Gaffer graph and generally is all that is needed to provide the functionality.
To actually utilise Gremlin queries however, a connection to what's known as a
GraphTraversalSource
is required which is the class from which Gremlin
traversals are spawned. To do this we recommend utilising the provided
Gremlin server framework
which can be configured to use the Gaffer Tinkerpop implementation so that a
endpoint is available for Gremlin queries.
Connecting to An Existing Accumulo Backed Graph
The recommended way to provide a Gremlin interface to an existing Gaffer instance is to connect directly to the same Accumulo store. Connecting this way means Gremlin communicates in a similar way to the Gaffer REST API and ensures the fastest performance when using Gremlin (there may still be a performance hit).
Note
It is possible to attach to other store types in a similar manner usually through a proxy store or federated store.
The general connection diagram looks something like the following:
flowchart LR
A(["User"])
--> B("Gremlin Server")
--> C("Accumulo Store")
To establish this connection you can make use of the existing gaffer-gremlin
OCI image which is an extension of the existing gremlin-server
image. This
provides the Tinkerpop library which allows users to connect Gaffer graphs as
well as some pre installed configuration to get up and running quickly.
Note
You will need to configure the default gaffer-gremlin
image to your
environment, please continue reading to learn more.
The gaffer-gremlin
Image
To use the image you will need to provide the normal Gaffer configuration files for to your environment along with a new GafferPop specific file (similar to the standard graph config JSON) they are:
store.properties
- Gaffer store configuration, this should match the existing graph you are connecting to.elements.json
andtypes.json
- The schema files for the graph you wish to connect to.gafferpop.properties
- Configuration for the Gaffer Tinkerpop library (Gafferpop).
Please read the subsections below on how to configure these files. Once these are configured you can use bind mounts to make them available when running the image:
docker run \
--name gaffer-gremlin \
--publish 8182:8182 \
--volume store.properties:/opt/gremlin-server/conf/gaffer/store.properties \
--volume schema:/opt/gremlin-server/conf/gaffer/schema \
--volume gafferpop.properties:/opt/gremlin-server/conf/gafferpop/gafferpop.properties \
tinkerpop/gremlin-server:latest gremlin-server.yaml
Configuring the Store Properties
Starting with the Store properties, this file should be largely identical to the store properties used on the main Gaffer deployment. The main purpose of this file is to ensure the same Accumulo cluster is connected to.
An example file is given below, please read the specific Accumulo store documentation for more detail:
gaffer.store.class=uk.gov.gchq.gaffer.accumulostore.AccumuloStore
gaffer.store.properties.class=uk.gov.gchq.gaffer.accumulostore.AccumuloProperties
accumulo.instance=accumulo
accumulo.zookeepers=zookeeper
accumulo.user=root
accumulo.password=secret
# General store config
gaffer.cache.service.class=uk.gov.gchq.gaffer.cache.impl.HashMapCacheService
gaffer.store.job.tracker.enabled=true
Configuring the Gafferpop Library
The gafferpop.properties
, file is the configuration for the Gaffer
implementation of Tinkerpop (a.k.a Gafferpop). Most of the set up here is for
the construction of the Gafferpop Graph instance which we want to make run with
the store.properties
we've already configured. An example gaffer.properties
would look like the following:
# The Tinkerpop graph class we should use
gremlin.graph=uk.gov.gchq.gaffer.tinkerpop.GafferPopGraph
gaffer.graphId=existingGraph
gaffer.storeproperties=conf/gaffer/store.properties
gaffer.userId=user01
Note
It is important the graphId
here matches the ID of the main graph you
wish to connect to as this controls which Accumulo table is connected to.
Many of these properties in the example above should be self explanatory, a full breakdown of of the available properties is as follows:
Property Key | Description |
---|---|
gremlin.graph |
The Tinkerpop graph class we should use |
gaffer.graphId |
The graph ID of the Tinkerpop graph |
gaffer.storeproperties |
The path to the store properties file |
gaffer.schemas |
The path to the directory containing the graph schema files |
gaffer.userId |
The default user ID for the Tinkerpop graph (see the authentication section) |
gaffer.dataAuths |
The default data auths for the user to specify what operations can be performed |
gaffer.operation.options |
Default Operation options in the form key:value (this can be overridden per query see here) |
Configuring the Gremlin Server
The underlying Gremlin server can also be configured if required. The gaffer-gremlin
image comes with an existing YAML configuration based on the example from the
Tinkerpop repository.
This file should be suitable for most use cases but a custom one can be provided
via a bind mount. If supplying a custom file please ensure you still include the
following sections:
Ensure the gafferpop.properties
file is set by modifying the graphs
section like so:
Ensure the Gaffer plugin is loaded for Gremlin which is achieved by adding the
following to the list of plugins in the plugins
section:
Tip
See the Tinkerpop docs for more information on Gremlin server configuration.
User Authentication
Full user authentication is possible with the Gremlin server using the framework
provided by standard Tinkerpop. The GafferPop implementation provides a
functional Authoriser
class that will handle passing the authenticated user to
the underlying Gaffer graph.
To activate user auth with the Gremlin server you must provide the classes you wish to use in the Gremlin server's YAML file like so:
# This should be a deployment specific class
authentication: {
authenticator: uk.gov.gchq.gaffer.tinkerpop.server.auth.ExampleGafferPopAuthenticator
}
# This class is necessary for correctly forwarding the user to Gaffer
authorization: {
authorizer: uk.gov.gchq.gaffer.tinkerpop.server.auth.GafferPopAuthoriser
}
The authorizer
should always be the GafferPopAuthoriser
as this is what
handles denying invalid queries for GafferPop and passing the user on to the
Gaffer graph for fine grained security.
Note
The GafferPopAuthoriser
will deny attempts to set the user ID via a
with("userId", <id>)
step in the Gremlin query.
The authenticator
should be a class specific to the auth mechanism for your
deployment e.g. LDAP. An example class ExampleGafferPopAuthenticator
is
provided as a start point but does not do any actual authenticating so should
not be used in production.
Tip
Tinkerpop provides some implementaions of Authenticators
for standard
mechanisms such as Kerberos.
Please see the Tinkerpop documentation for more info.