ReservoirItemsSketch
The code for this example is ReservoirItemsSketchWalkthrough.
This example demonstrates how the ReservoirItemsSketch
Elements schema
This is our new elements schema. The edge has a property called 'stringsSample'. This will store the ReservoirItemsSketch
{
"entities": {
"blueEntity": {
"vertex": "vertex.string",
"properties": {
"neighboursSample": "reservoir.strings.sketch"
}
}
},
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"stringsSample": "reservoir.strings.sketch"
}
},
"blue": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false"
}
}
}
Types schema
We have added a new type - 'reservoir.strings.sketch'. This is a com.yahoo.sketches.sampling.ReservoirItemsSketch object. We also added in the serialiser and aggregator for the ReservoirItemsSketch object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new edges are added to the graph.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"reservoir.strings.sketch": {
"class": "com.yahoo.sketches.sampling.ReservoirItemsSketch",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.sampling.binaryoperator.ReservoirItemsSketchAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.sampling.serialisation.ReservoirStringsSketchSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
An edge A-B of group "red" was added to the graph 1000 times. Each time it had the stringsSample property containing a randomly generated string. Here is the edge:
Edge[source=A,destination=B,directed=false,matchedVertex=SOURCE,group=red,properties=Properties[stringsSample=<com.yahoo.sketches.sampling.ReservoirItemsSketch>
### ReservoirItemsSketch SUMMARY:
k : 20
n : 1000
Current size : 20
Resize factor: X8
### END SKETCH SUMMARY
]]
This is not very illuminating as this just shows the default toString()
method on the sketch. To get value from it we need to call a method on the ReservoirItemsSketch object:
final GetElements query = new GetElements.Builder()
.input(new EdgeSeed("A", "B", DirectedType.UNDIRECTED))
.build();
final Element edge;
try (final CloseableIterable<? extends Element> edges = graph.execute(query, user)) {
edge = edges.iterator().next();
}
final ReservoirItemsSketch<String> stringsSketch = ((ReservoirItemsSketch<String>) edge.getProperty("stringsSample"));
final String[] samples = stringsSketch.getSamples();
final StringBuilder sb = new StringBuilder("10 samples: ");
for (int i = 0; i < 10 && i < samples.length; i++) {
if (i > 0) {
sb.append(", ");
}
sb.append(samples[i]);
}
The results contain a random sample of the strings added to the edge:
10 samples: HHEJIACJGH, IEJDBAGEAH, IIAGEJIBGF, DDBIAIEGHD, ABEEIADDGB, ACIDACAIIG, CEBCFFHCFI, CBCDJDCFFD, AHBHIHFDJI, GDFABEFFAF
500 edges of group "blue" were also added to the graph (edges X-Y0, X-Y1, ..., X-Y499). For each of these edges, an Entity was created for both the source and destination. Each Entity contained a 'neighboursSample' property that contains the vertex at the other end of the edge. We now get the Entity for the vertex X and display the sample of its neighbours:
final GetElements query2 = new GetElements.Builder()
.input(new EntitySeed("X"))
.view(new View.Builder()
.entity("blueEntity")
.build())
.build();
final Element entity;
try (final CloseableIterable<? extends Element> entities = graph.execute(query2, user)) {
entity = entities.iterator().next();
}
final ReservoirItemsSketch<String> neighboursSketch = ((ReservoirItemsSketch<String>) entity.getProperty("neighboursSample"));
final String[] neighboursSample = neighboursSketch.getSamples();
sb.setLength(0);
sb.append("10 samples: ");
for (int i = 0; i < 10 && i < neighboursSample.length; i++) {
if (i > 0) {
sb.append(", ");
}
sb.append(neighboursSample[i]);
}
The results are:
10 samples: Y45, Y38, Y196, Y108, Y461, Y296, Y337, Y7, Y413, Y148