Filtering
The code for this example is Filtering.
Filtering in Gaffer is designed so it can be applied server side and distributed across a cluster for performance.
In this example we’ll query for some Edges and filter the results based on the aggregated value of a property. We will use the same schema and data as the previous example.
If we want to query for the RoadUse Edges containing vertex ”10”
the operation would look like this:
final GetElements getElements = new GetElements.Builder()
.input(new EntitySeed("10"))
.view(new View.Builder()
.edge("RoadUse")
.build())
.build();
final CloseableIterable<? extends Element> results = graph.execute(getElements, user);
{
"class" : "GetElements",
"input" : [ {
"class" : "EntitySeed",
"vertex" : "10"
} ],
"view" : {
"edges" : {
"RoadUse" : { }
}
}
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EntitySeed",
"vertex" : "10"
} ],
"view" : {
"edges" : {
"RoadUse" : { }
}
}
}
g.GetElements(
view=g.View(
edges=[
g.ElementDefinition(
group="RoadUse"
)
],
all_edges=False,
all_entities=False
),
input=[
g.EntitySeed(
vertex="10"
)
]
)
Here are the result Edges with the counts aggregated:
Edge[source=11,destination=10,directed=true,matchedVertex=DESTINATION,group=RoadUse,properties=Properties[count=<java.lang.Long>1]]
Edge[source=10,destination=11,directed=true,matchedVertex=SOURCE,group=RoadUse,properties=Properties[count=<java.lang.Long>3]]
Now let’s look at how to filter which Edges are returned based on the aggregated value of their count.
For example, only return Edges containing vertex ”10”
where the ”count”
> 2.
We do this using a View and ViewElementDefinition like this:
final GetElements getEdgesWithCountMoreThan2 = new GetElements.Builder()
.input(new EntitySeed("10"))
.view(new View.Builder()
.edge("RoadUse", new ViewElementDefinition.Builder()
.postAggregationFilter(new ElementFilter.Builder()
.select("count")
.execute(new IsMoreThan(2L))
.build())
.build())
.build())
.build();
final CloseableIterable<? extends Element> filteredResults = graph.execute(getEdgesWithCountMoreThan2, user);
{
"class" : "GetElements",
"input" : [ {
"class" : "EntitySeed",
"vertex" : "10"
} ],
"view" : {
"edges" : {
"RoadUse" : {
"postAggregationFilterFunctions" : [ {
"selection" : [ "count" ],
"predicate" : {
"class" : "IsMoreThan",
"orEqualTo" : false,
"value" : {
"Long" : 2
}
}
} ]
}
}
}
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EntitySeed",
"vertex" : "10"
} ],
"view" : {
"edges" : {
"RoadUse" : {
"postAggregationFilterFunctions" : [ {
"selection" : [ "count" ],
"predicate" : {
"class" : "uk.gov.gchq.koryphe.impl.predicate.IsMoreThan",
"orEqualTo" : false,
"value" : {
"java.lang.Long" : 2
}
}
} ]
}
}
}
}
g.GetElements(
view=g.View(
edges=[
g.ElementDefinition(
group="RoadUse",
post_aggregation_filter_functions=[
g.PredicateContext(
selection=[
"count"
],
predicate=g.IsMoreThan(
value={'java.lang.Long': 2},
or_equal_to=False
)
)
]
)
],
all_edges=False,
all_entities=False
),
input=[
g.EntitySeed(
vertex="10"
)
]
)
Our ViewElementDefinition allows us to perform post Aggregation filtering using an IsMoreThan Predicate.
Querying with our view, we now get only those vertex ”10”
Edges where the ”count”
> 2:
Edge[source=10,destination=11,directed=true,matchedVertex=SOURCE,group=RoadUse,properties=Properties[count=<java.lang.Long>3]]
In the filter, we selected the count
property. This extracts the value of the count
property and passes it to the IsMoreThan Predicate.
We can choose to select any property or one of the following identifiers:
- VERTEX - this is the vertex on an Entity
- SOURCE - this is the source vertex on an Edge
- DESTINATION - this is the destination vertex on an Edge
- DIRECTED - this is the directed field on an Edge
- MATCHED_VERTEX - this is the vertex that was matched in the query, either the SOURCE or the DESTINATION
- ADJACENT_MATCHED_VERTEX - this is the adjacent vertex that was matched in the query, either the SOURCE or the DESTINATION. I.e if your seed matches the source of the edge this would resolve to the DESTINATION value.
We chose to use the IsMoreThan Predicate, however the full list of our core Predicates are documented in Predicates. You can also write your own Predicate implementations and include them on the class path. When choosing a Predicate you must ensure your input selection (the property and identifier types) match the Predicate input types. For example the IsMoreThan Predicate accepts a single Comparable value. Whereas the IsXMoreThanY Predicate accepts 2 Comparable values. The Predicate inputs are also documented within the Predicate examples documentation.
For more information on Views and filtering, see Views.
Additional Filtering
In addition to filtering using a View, there are extra filters that can be applied to specific operations.
directedType
GetElements, GetAllElements and GetAdjacentIds have a 'directedType' field that you can configure, telling Gaffer that you only want edges that are DIRECTED, UNDIRECTED or EITHER.
The default value is EITHER.
includeIncomingOutGoing
GetElements and GetAdjacentIds have an 'includeIncomingOutGoing' field that you can configure, telling Gaffer that you only want edges that are OUTGOING, INCOMING, or EITHER, in relation to your seed. This is only applicable to directed edges.
The default value is EITHER.
For example if you have edges:
- A - B
- B - C
- D -> B
- B -> E
- F - G
- H -> I
and you provide a seed B, then:
- OUTGOING would only get back A - B, B - C and B -> E.
- INCOMING would only get back A - B, B - C and D -> B.
- EITHER would get back all edges that have a B vertex.
seedMatching
GetElements has a 'seedMatching' field that you can configure, telling Gaffer that you only want edges that are EQUAL or RELATED to your seeds.
The default value is RELATED.
EQUAL will only return Entities and Edges with identifiers that match the seed exactly.
- if you provide an Entity seed, you will only get back Entities that have the same vertex value.
- if you provide an Edge seed, you will only get back Edges that have the same source, destination and directed values.
RELATED will return the EQUAL results (as above) and additional Entities and Edges:
- if you provide an Entity seed, you will also get back Edges that have the same source or destination as the vertex value.
- if you provide an Edge seed, you will also get back Entities that have the same vertex value as the source or destination.
Deprecation
As the seedMatching flag has now been deprecated, to run equivalent Operations with EQUAL seedMatching, you must instead use a View. As the default for seedMatching is RELATED, if you use this then nothing will change. If you instead set seedMatching to EQUAL, then you can refer to the examples below on how to replace with a View:
Edges
SeedMatching:
final GetElements getEdgesWithSeedMatching = new GetElements.Builder()
.input(new EdgeSeed("source", "dest", true))
.seedMatching(SeedMatching.SeedMatchingType.EQUAL)
.build();
{
"class" : "GetElements",
"input" : [ {
"class" : "EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"seedMatching" : "EQUAL"
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"seedMatching" : "EQUAL"
}
g.GetElements(
input=[
g.EdgeSeed(
source="source",
destination="dest",
directed_type="DIRECTED",
matched_vertex="SOURCE"
)
],
seed_matching="EQUAL"
)
View:
final GetElements getEdgesWithoutSeedMatching = new GetElements.Builder()
.input(new EdgeSeed("source", "dest", true))
.view(new View.Builder()
.edge("group1")
.build())
.build();
{
"class" : "GetElements",
"input" : [ {
"class" : "EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"view" : {
"edges" : {
"group1" : { }
}
}
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"view" : {
"edges" : {
"group1" : { }
}
}
}
g.GetElements(
view=g.View(
edges=[
g.ElementDefinition(
group="group1"
)
],
all_edges=False,
all_entities=False
),
input=[
g.EdgeSeed(
source="source",
destination="dest",
directed_type="DIRECTED",
matched_vertex="SOURCE"
)
]
)
Entities
SeedMatching:
final GetElements getEntitiesWithSeedMatching = new GetElements.Builder()
.input(new EntitySeed("vertex"))
.seedMatching(SeedMatching.SeedMatchingType.EQUAL)
.build();
{
"class" : "GetElements",
"input" : [ {
"class" : "EntitySeed",
"vertex" : "vertex"
} ],
"seedMatching" : "EQUAL"
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EntitySeed",
"vertex" : "vertex"
} ],
"seedMatching" : "EQUAL"
}
g.GetElements(
input=[
g.EntitySeed(
vertex="vertex"
)
],
seed_matching="EQUAL"
)
View:
final GetElements getEntitiesWithoutSeedMatching = new GetElements.Builder()
.input(new EntitySeed("vertex"))
.view(new View.Builder()
.entity("group1")
.build())
.build();
{
"class" : "GetElements",
"input" : [ {
"class" : "EntitySeed",
"vertex" : "vertex"
} ],
"view" : {
"entities" : {
"group1" : { }
}
}
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EntitySeed",
"vertex" : "vertex"
} ],
"view" : {
"entities" : {
"group1" : { }
}
}
}
g.GetElements(
view=g.View(
entities=[
g.ElementDefinition(
group="group1"
)
],
all_edges=False,
all_entities=False
),
input=[
g.EntitySeed(
vertex="vertex"
)
]
)
Entities and edges
There is one limitation however, if you use seedMatching as EQUAL and specify both Edges and Entities in your input, that will now have to be done under 2 Operations within an OperationChain as there can only be one View applied globally to all input. See example below:
SeedMatching:
final GetElements getBothWithSeedMatching = new GetElements.Builder()
.input(new EntitySeed("vertex"), new EdgeSeed("source", "dest", true))
.seedMatching(SeedMatching.SeedMatchingType.EQUAL)
.build();
{
"class" : "GetElements",
"input" : [ {
"class" : "EntitySeed",
"vertex" : "vertex"
}, {
"class" : "EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"seedMatching" : "EQUAL"
}
{
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EntitySeed",
"vertex" : "vertex"
}, {
"class" : "uk.gov.gchq.gaffer.operation.data.EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"seedMatching" : "EQUAL"
}
g.GetElements(
input=[
g.EntitySeed(
vertex="vertex"
),
g.EdgeSeed(
source="source",
destination="dest",
directed_type="DIRECTED",
matched_vertex="SOURCE"
)
],
seed_matching="EQUAL"
)
Two Operations in OperationChain using different Views:
final OperationChain getBothWithoutSeedMatching = new OperationChain.Builder()
.first(new GetElements.Builder()
.input(new EntitySeed("vertex"))
.view(new View.Builder()
.entity("group1")
.build())
.build())
.then(new GetElements.Builder()
.input(new EdgeSeed("source", "dest", true))
.view(new View.Builder()
.edge("group1")
.build())
.build())
.build();
{
"class" : "OperationChain",
"operations" : [ {
"class" : "GetElements",
"input" : [ {
"class" : "EntitySeed",
"vertex" : "vertex"
} ],
"view" : {
"entities" : {
"group1" : { }
}
}
}, {
"class" : "GetElements",
"input" : [ {
"class" : "EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"view" : {
"edges" : {
"group1" : { }
}
}
} ]
}
{
"class" : "uk.gov.gchq.gaffer.operation.OperationChain",
"operations" : [ {
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EntitySeed",
"vertex" : "vertex"
} ],
"view" : {
"entities" : {
"group1" : { }
}
}
}, {
"class" : "uk.gov.gchq.gaffer.operation.impl.get.GetElements",
"input" : [ {
"class" : "uk.gov.gchq.gaffer.operation.data.EdgeSeed",
"source" : "source",
"destination" : "dest",
"matchedVertex" : "SOURCE",
"directedType" : "DIRECTED"
} ],
"view" : {
"edges" : {
"group1" : { }
}
}
} ]
}
g.OperationChain(
operations=[
g.GetElements(
view=g.View(
entities=[
g.ElementDefinition(
group="group1"
)
],
all_edges=False,
all_entities=False
),
input=[
g.EntitySeed(
vertex="vertex"
)
]
),
g.GetElements(
view=g.View(
edges=[
g.ElementDefinition(
group="group1"
)
],
all_edges=False,
all_entities=False
),
input=[
g.EdgeSeed(
source="source",
destination="dest",
directed_type="DIRECTED",
matched_vertex="SOURCE"
)
]
)
]
)