Extending Gaffer
Extending Gaffer can mean a few things to a developer, this guide aims to cover general use cases for how and why you may want to extend the capabilities or add customisation to Gaffer.
Writing Custom Classes
As Gaffer is Java based to create and load additional libraries you will first need to write custom classes. Gaffer has many places where custom classes can be used, such as custom operations, aggregation functions, element generators etc.
Depending on what type of class you are writing, e.g. an Operation
, you may
need to extend or implement one of the interface classes already in Gaffer.
Assuming you are using Maven for package management you can add Gaffer as
a dependency to gain access to its classes like below:
<dependency>
<groupId>uk.gov.gchq.gaffer</groupId>
<artifactId>${gaffer.component}</artifactId>
<version>${gaffer.version}</version>
</dependency>
Tip
See the components breakdown for a list of key Gaffer modules.
Writing an Operation
In Gaffer we use the term Operation
to describe some component of a query affecting a graph, these are
chained together in OperationChains
and this is how we execute
queries to our graphs. Whilst there are a wealth of existing
operations and their associated handlers provided out of the box with
Gaffer. In this section we'll talk about how you might go about adding a new
operation/handler with some simple examples.
The Operation Interface
All Operations must adhere to and implement the Operation
interface, this interface describes the building blocks of an operation.
One important distinction here is that an implementation of Operation
should not
contain the actual logic for an Operation
, think of these as a task definition
or an instruction manual for what an Operation
should do. This then goes hand in
hand with our OperationHandler
(s) which define the actual functionality,
either generically or split into multiple handlers for different stores.
Selecting an Interface
In general, most Operation
implementations don't actually directly implement the
Operation
Interface, they actually implement one or more extended interfaces.
For example the Input
and Output
interfaces extend on top of the core Operation
interface and concordantly the InputOutput
interface is a combination of these.
These all live inside the Gaffer Core library under the Operation
subdirectory
(see the component breakdown for
more details).
Implementing an Interface
The example we'll use here is the
ToSet
Operation, this takes an Iterable
of Elements
and converts it into a
Set
, a handy way to remove duplicate Elements
if you're doing multiple hops
and end up getting duplicates back.
Taking a look at the Javadoc
for this class you should be able to see this class implements both
InputOutput
, with an input type of Iterable<? extends T>
and an output type
of Set<? extends T>
and MultiInput<T>
. Within the class it also implements
all the requisite methods eg, shallowClone()
, setInput()
, getOptions()
etc.
How to Write the Handler
Using the same ToSet
Operation from before, now that we have this definition
of an operation, we need to actually tell Gaffer how to do it. We do this by
defining either a single generic Handler
and linking it to the Operation or in
many cases, we might want different implementations based on the store being used
so we'd write a number of operation handlers and decide which one is needed at
runtime.
Here is the ToSetHandler
which implements the appropriate Handler
interface (in this case the
OutputOperationHandler
). It takes in the ToSet
Operation that has been built
with all the options etc. defined and then does the actual doOperation()
logic to
stream our Iterable's contents into a LinkedHashSet
. This and most other
handlers live inside the Store library.
How does Gaffer know which Handler to use?
Handlers are registered in the Store
class via the addOpHandlers()
method,
specifically for our example addOperationHandler(ToSet.class, new
ToSetHandler<>())
is called to register it. Once this is done we can then
use our Operation
and its Handler
in a user defined OperationChain
to effect
some query on our graph data.
You can alternatively use the an operations declarations config file (typically
called operationsDeclarations.json
) which can be used to define the
Operation
and its Handler like below:
{
"operations": [
{
"operation": "uk.gov.gchq.gaffer.operation.impl.output.ToSet",
"handler": {
"class": "uk.gov.gchq.gaffer.store.operation.handler.output.ToSetHandler"
}
}
]
}
Writing an Element/Object Generator
There are two types of generators used, one to transform a domain object into a
Gaffer Element
and one to transform the other way. These generators are the
ElementGenerator
and ObjectGenerator
.
Living within the Core/Data section of the repository these generator interfaces
are quite simple and as with Operation
are extended by other interfaces to
provide a more nuanced set of interfaces you could choose to use over the core
Element/Object generator interfaces.
As an example we will walk through the one of the existing generators that can
take a Neo4j CSV file and convert the entries to Gaffer Elements
.
/**
* An {@link OpenCypherCsvElementGenerator}s that will generate
* Gaffer Elements from Neo4j CSV strings.
*/
@Since("2.0.0")
@Summary("Generates elements from a Neo4j CSV")
public class Neo4jCsvElementGenerator extends OpenCypherCsvElementGenerator {
@Override
protected LinkedHashMap<String, String> getFields() {
return new Neo4jCsvGenerator().getFields();
}
}
This class may appear very simple but the real functionality for this generator
is in the OpenCypherCsvElementGenerator
that it extends. This generator is an
abstract class that implements the ElementGenerator
interface and has a lot of
functionality for handling Open Cypher CSVs.
You can view the full class in the repository
which has the logic for converting the CSV format but the key bit to take note
of is the apply(final Iterable<? extends String> strings)
method. This method
is the main override which will be passed the lines of a CSV file to convert and
return a Iterable<? extends Element>
e.g. a list of Elements
.
This pattern is repeated across many of the different generators and can be used as inspiration for your custom generator. As with the Operations so long as you're implementing the appropriate interfaces it's hard to go far wrong.
Writing an Aggregation Function
We can provide a custom aggregation function in a number of ways, most of the
aggregation functionality is provided by the Koryphe library
sat adjacent to Gaffer. If we want to provide a new "generic" aggregator we
would add it in this Koryphe library. For example lets take a look at the very
simple Max
comparator, this takes a pair of Java 8 Comparables
and finds the
highest value one, this function is applied as an aggregation.
@Since("1.0.0")
@Summary("Calculates the max value")
public class Max extends KorypheBinaryOperator<Comparable> {
@Override
protected Comparable _apply(final Comparable a, final Comparable b) {
return a.compareTo(b) >= 0 ? a : b;
}
}
If we want to add something very specific to a store type or some other
restriction we can add this into the appropriate Store location within Gaffer.
An example of this is the HyperLogLogPlusAggregator
in the Sketches library,
this merges HLLPs together.
@Since("1.0.0")
@Summary("Aggregates HyperLogLogPlus objects")
@Deprecated
public class HyperLogLogPlusAggregator extends KorypheBinaryOperator<HyperLogLogPlus> {
@Override
protected HyperLogLogPlus _apply(final HyperLogLogPlus a, final HyperLogLogPlus b) {
try {
a.addAll(b);
} catch (final CardinalityMergeException exception) {
throw new RuntimeException("An Exception occurred when trying to aggregate the HyperLogLogPlus objects", exception);
}
return a;
}
}
You can also add aggregators by loading a JAR into Accumulo and
accessing it on the classpath, this is more something an administrator
customising their usage of the platform might do rather than a developer. For
development purposes it's usually good to check if you can make use of an
existing function in Koryphe first, if not then you can then write your own to
be loaded into the classpath. As long as the new function extends a
KorypheBinaryOperator<T>
then it can be used as an aggregation function for
Gaffer.
Loading Custom Libraries
Once you have written your custom classes to make them available the simplest way is to compile to a JAR and load on the Java class path at runtime.
If you are using the container images this is as simple as adding your JAR(s) to a directory in the image, this is explained in detail in the deployment guide.