Using Sketches with the REST API
This page explains some nuances and special steps required when using classes from the Sketches library with the REST API. If you just want to know how to use the sketches libraries to use cardinality, see the cardinality docs page.
Sketches Library
To learn more about the Sketches library see advanced properties reference page.
The sketches library is included by default with the Map and Accumulo stores. This is because the sketches-library
is a dependency in each of
the respective store modules' poms. As well as this, the serialisation is handled by the fact the
SketchesJsonModules
is returned by the getJsonSerialiserModules
method in both the
Map
and Accumulo
property classes. The modules are then loaded by the JSONSerialiser
and used during the deserialisation of the REST JSON queries.
HyperLogLog sketches
Gaffer currently supports the Datasketches HllSketch and Clearspring HyperLogLogPlus algorithms. The Clearspring HyperLogLogPlus has been deprecated in Gaffer and we recommend the Datasketches HllSketch to users for the reasons described in the advanced properties guide.
The HllSketch
and HyperLogLogPlus
sketches can be used to store an approximation of
cardinality of an element. The JSON of the query is converted to Java
objects during deserialisation using the JSONSerialiser
. During the
deserialisation, the sketch's JSON representation is converted to a Java
object using the ObjectMapper
module which uses the relevant deserialiser (
HyperLogLogPlusJsonDeserialiser or HllSketchJsonDeserialiser).
Creating cardinality values over JSON
When adding or updating a cardinality object over the REST API, you specify the vertex values to add to the sketch.
This is done by either using the offers
field with HyperLogLogPlus
, or the values
field with HllSketch
.
The HyperLogLog object is then instantiated and updated with
the values. The object can then be serialised and stored in the datastore.
The vertex object is serialised using the toString
representation of the object.
Note
As the algorithms use the toString
method, any user defined type
introduced must override the toString
method returning meaningful string
value representing the object rather than the default class instance
identifier. User defined types can be introduced by either adding further
types
to Gaffer or by adding a jar with the extra type(s) to the Gaffer
classpath on startup.
Depending on whether you are using HyperLogLogPlus
or HllSketch
, either the
HyperLogLogPlusWithOffers
or the
HllSketchWithValues
respectively is responsible for the JSON deserialisation.
The helper classes wrap the underlying sketch and includes the following annotation on
the offers
/values
field:
This signals to the Jackson ObjectMapper
that it needs to look for the
class
field in each object and translate to the correct object type.
Primitive data types over JSON
Primitive types are converted to the correct format by Jackson
ObjectMapper
automatically. Here are some examples of the values:
"values": ["valueA", "value2",...]
"values": [1, 2,...]
"values": [1.1, 2.2,...]
Non-primitive data types over JSON
In order to convert non-primitive vertex values (like TypeSubTypeValue
) to Java objects, the JSON values need to contain the special field class
containing the class name of the object. The deserialiser
uses this class
field when deserialising using the JSONSerialiser
deserialise
method.
Here are the Gaffer user defined types:
Note
The subclass fields must also have the class
field set (for
example, the keySerialiser
in the CustomMap
type) if not a standard Java Object
so that the Jackson ObjectMapper
knows how to convert the correct values
to Java objects.
Composing using Java
If you are composing the HllSketch
with values using Java, before
converting to JSON and sending via REST, you need ensure that the values
objects are translated to JSON with the correct class
field added.
To make sure of this, you could add the sketches-library
JAR and use the
HllSketchWithValues
object to construct your query (or the equivalent for HyperLogLogPlus).
This way you know that all the objects have the
correct field added. You can then convert the HllSketchWithValues
to
JSON using the
JSONSerialiser
serialisation
method:
final HllSketchWithValues hllSketchWithValues = JSONSerialiser.deserialise(treeNode.toString(), HllSketchWithValues.class);
HllSketchWithValues
, ensure
that the values
list has the correct annotation so the class
is added on
conversion using by the Jackson ObjectMapper
:
@JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, property = "class")
private List<Object> values = new ArrayList<>();
Composing using Python
An example of using Python to add a HyperLogLogPlus
property with a TypeSubTypeValue
offer:
g.AddElements(
input=[
g.Entity(
vertex="A",
group="cardinality",
properties={
"hllp": g.hyper_log_log_plus([
{
"class" : "uk.gov.gchq.gaffer.types.TypeSubTypeValue",
"type" : "t",
"subType" : "st",
"value" : "B"
}
])
}
)
]
)
An example of using Python to add a HllSketch
property with a TypeSubTypeValue
offer:
g.AddElements(
input=[
g.Entity(
vertex="A",
group="cardinality",
properties={
"hllSketch": g.hll_sketch([
{
"class" : "uk.gov.gchq.gaffer.types.TypeSubTypeValue",
"type" : "t",
"subType" : "st",
"value" : "B"
}
])
}
)
]
)
Adding user defined vertex types into offers
To add a user defined type you must ensure that:
- the type is on the Gaffer classpath
- the type must override the
toString
method - the type contains the correct annotations if you are converting from Java to JSON before sending via REST
The following user defined type example features the annotation required as
well as the @Override
of the toString
method: