HllSketch

The code for this example is HllSketchWalkthrough.

This example demonstrates how the HllSketch sketch from the Data Sketches library can be used to maintain an estimate of the degree of a vertex. Every time an edge A -> B is added to graph, we also add an Entity for A with a property of a HllSketch containing B, and an Entity for B with a property of a HllSketch containing A. The aggregator for the HllSketches merges them together so that after querying for the Entity for vertex X the HllSketch property would give us an estimate of the approximate degree.

Elements schema

This is our new elements schema. The edge has a property called 'approx_cardinality'. This will store the HllSketch object.

{
  "entities": {
    "cardinality": {
      "vertex": "vertex.string",
      "properties": {
        "approxCardinality": "hllsketch"
      }
    }
  }
}
Types schema

We have added a new type - 'hllsketch'. This is a com.yahoo.sketches.hll.HllSketch object. We also added in the serialiser and aggregator for the HllSketch object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new entities are added to the graph.

{
  "types": {
    "vertex.string": {
      "class": "java.lang.String",
      "validateFunctions": [
        {
          "class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
        }
      ]
    },
    "hllsketch": {
      "class": "com.yahoo.sketches.hll.HllSketch",
      "aggregateFunction": {
        "class": "uk.gov.gchq.gaffer.sketches.datasketches.cardinality.binaryoperator.HllSketchAggregator"
      },
      "serialiser": {
        "class": "uk.gov.gchq.gaffer.sketches.datasketches.cardinality.serialisation.HllSketchSerialiser"
      }
    }
  }
}

Only one entity is in the graph. This was added 1000 times, and each time it had the 'approxCardinality' property containing a vertex that A had been seen in an Edge with. Here is the Entity:

Entity[vertex=A,group=cardinality,properties=Properties[approxCardinality=<com.yahoo.sketches.hll.HllSketch>### HLL SKETCH SUMMARY: 
  Log Config K   : 10
  Hll Target     : HLL_4
  Current Mode   : HLL
  LB             : 986.8136434119266
  Estimate       : 1018.8398354963819
  UB             : 1052.991638617674
  OutOfOrder Flag: true
  CurMin         : 0
  NumAtCurMin    : 374
  HipAccum       : 1045.0654080765041
  KxQ0           : 562.4995727539062
  KxQ1           : 0.0
]]

This is not very illuminating as this just shows the default toString() method on the sketch.

We can fetch the cardinality for the vertex using the following code:

Java
final GetElements query = new GetElements.Builder()
        .input(new EntitySeed("A"))
        .build();
final Element element;
try (final CloseableIterable<? extends Element> elements = graph.execute(query, user)) {
    element = elements.iterator().next();
}
final HllSketch hllSketch = (HllSketch) element.getProperty("approxCardinality");
final double approxDegree = hllSketch.getEstimate();
final String degreeEstimate = "Entity A has approximate degree " + approxDegree;

The results are as follows. As an Entity was added 1000 times, each time with a different vertex, then we would expect the degree to be approximately 1000.

Entity A has approximate degree 1018.8398354963819

results matching ""

    No results matching ""