HyperLogLogPlus
The code for this example is HyperLogLogPlusWalkthrough.
This example demonstrates how the HyperLogLogPlus sketch from the Clearspring library can be used to maintain an estimate of the degree of a vertex. Every time an edge A -> B is added to graph, we also add an Entity for A with a property of a HyperLogLogPlus containing B, and an Entity for B with a property of a HyperLogLogPlus containing A. The aggregator for the HyperLogLogPluses merges them together so that after querying for the Entity for vertex X the HyperLogLogPlus property gives us an estimate of the approximate degree.
Elements schema
This is our new elements schema. The edge has a property called 'approx_cardinality'. This will store the HyperLogLogPlus object.
{
"entities": {
"cardinality": {
"vertex": "vertex.string",
"properties": {
"approxCardinality": "hyperloglogplus"
}
}
}
}
Types schema
We have added a new type - 'hyperloglogplus'. This is a com.clearspring.analytics.stream.cardinality.HyperLogLogPlus object. We also added in the serialiser and aggregator for the HyperLogLogPlus object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new entities are added to the graph.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"hyperloglogplus": {
"class": "com.clearspring.analytics.stream.cardinality.HyperLogLogPlus",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.sketches.clearspring.cardinality.binaryoperator.HyperLogLogPlusAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.sketches.clearspring.cardinality.serialisation.HyperLogLogPlusSerialiser"
}
}
}
}
Only one entity is in the graph. This was added 1000 times, and each time it had the 'approxCardinality' property containing a vertex that A had been seen in an Edge with. Here is the Entity:
Entity[vertex=A,group=cardinality,properties=Properties[approxCardinality=<com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>com.clearspring.analytics.stream.cardinality.HyperLogLogPlus@1bbd2da7]]
This is not very illuminating as this just shows the default toString()
method on the sketch.
We can fetch the cardinality for the vertex using the following code:
final GetElements query = new GetElements.Builder()
.input(new EntitySeed("A"))
.build();
final Element element;
try (final CloseableIterable<? extends Element> elements = graph.execute(query, user)) {
element = elements.iterator().next();
}
final HyperLogLogPlus hyperLogLogPlus = (HyperLogLogPlus) element.getProperty("approxCardinality");
final double approxDegree = hyperLogLogPlus.cardinality();
final String degreeEstimate = "Entity A has approximate degree " + approxDegree;
The results are as follows. As an Entity was added 1000 times, each time with a different vertex, then we would expect the degree to be approximately 1000.
Entity A has approximate degree 1113.0