Using Sketches with the REST API
This page explains some nuances and special steps required when using classes from the Sketches library with the REST API. If you just want to know how to use the sketches libraries to use cardinality, see the cardinality docs page.
Sketches Library
To learn more about the Sketches library see advanced properties reference page.
The sketches library is included by default with the Map and Accumulo stores. This is because the sketches-library is a dependency in each of
the respective store modules' poms. As well as this, the serialisation is handled by the fact the
SketchesJsonModules
is returned by the getJsonSerialiserModules method in both the
Map
and Accumulo
property classes. The modules are then loaded by the JSONSerialiser
and used during the deserialisation of the REST JSON queries.
HyperLogLog sketches
Gaffer currently supports the Datasketches HllSketch and Clearspring HyperLogLogPlus algorithms. The Clearspring HyperLogLogPlus has been deprecated in Gaffer and we recommend the Datasketches HllSketch to users for the reasons described in the advanced properties guide.
The HllSketch and HyperLogLogPlus sketches can be used to store an approximation of
cardinality of an element. The JSON of the query is converted to Java
objects during deserialisation using the JSONSerialiser. During the
deserialisation, the sketch's JSON representation is converted to a Java
object using the ObjectMapper module which uses the relevant deserialiser (
HyperLogLogPlusJsonDeserialiser or HllSketchJsonDeserialiser).
Creating cardinality values over JSON
When adding or updating a cardinality object over the REST API, you specify the vertex values to add to the sketch.
This is done by either using the offers field with HyperLogLogPlus, or the values field with HllSketch.
The HyperLogLog object is then instantiated and updated with
the values. The object can then be serialised and stored in the datastore.
The vertex object is serialised using the toString representation of the object.
Note
As the algorithms use the toString method, any user defined type
introduced must override the toString method returning meaningful string
value representing the object rather than the default class instance
identifier. User defined types can be introduced by either adding further
types
to Gaffer or by adding a jar with the extra type(s) to the Gaffer
classpath on startup.
Depending on whether you are using HyperLogLogPlus or HllSketch, either the
HyperLogLogPlusWithOffers or the
HllSketchWithValues
respectively is responsible for the JSON deserialisation.
The helper classes wrap the underlying sketch and includes the following annotation on
the offers/values field:
This signals to the Jackson ObjectMapper that it needs to look for the
class field in each object and translate to the correct object type.
Primitive data types over JSON
Primitive types are converted to the correct format by Jackson
ObjectMapper automatically. Here are some examples of the values:
"values": ["valueA", "value2",...]
"values": [1, 2,...]
"values": [1.1, 2.2,...]
Non-primitive data types over JSON
In order to convert non-primitive vertex values (like TypeSubTypeValue) to Java objects, the JSON values need to contain the special field class
containing the class name of the object. The deserialiser uses this class
field when deserialising using the JSONSerialiser
deserialise method.
Here are the Gaffer user defined types:
Note
The subclass fields must also have the class field set (for
example, the keySerialiser in the CustomMap type) if not a standard Java Object
so that the Jackson ObjectMapper knows how to convert the correct values
to Java objects.
Composing using Java
If you are composing the HllSketch with values using Java, before
converting to JSON and sending via REST, you need ensure that the values
objects are translated to JSON with the correct class field added.
To make sure of this, you could add the sketches-library JAR and use the
HllSketchWithValues
object to construct your query (or the equivalent for HyperLogLogPlus).
This way you know that all the objects have the
correct field added. You can then convert the HllSketchWithValues to
JSON using the
JSONSerialiser
serialisation method:
final HllSketchWithValues hllSketchWithValues = JSONSerialiser.deserialise(treeNode.toString(), HllSketchWithValues.class);
HllSketchWithValues, ensure
that the values list has the correct annotation so the class is added on
conversion using by the Jackson ObjectMapper:
@JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, property = "class")
private List<Object> values = new ArrayList<>();
Composing using Python
An example of using Python to add a HyperLogLogPlus property with a TypeSubTypeValue offer:
g.AddElements(
input=[
g.Entity(
vertex="A",
group="cardinality",
properties={
"hllp": g.hyper_log_log_plus([
{
"class" : "uk.gov.gchq.gaffer.types.TypeSubTypeValue",
"type" : "t",
"subType" : "st",
"value" : "B"
}
])
}
)
]
)
An example of using Python to add a HllSketch property with a TypeSubTypeValue offer:
g.AddElements(
input=[
g.Entity(
vertex="A",
group="cardinality",
properties={
"hllSketch": g.hll_sketch([
{
"class" : "uk.gov.gchq.gaffer.types.TypeSubTypeValue",
"type" : "t",
"subType" : "st",
"value" : "B"
}
])
}
)
]
)
Adding user defined vertex types into offers
To add a user defined type you must ensure that:
- the type is on the Gaffer classpath
- the type must override the
toStringmethod - the type contains the correct annotations if you are converting from Java to JSON before sending via REST
The following user defined type example features the annotation required as
well as the @Override of the toString method: