Skip to content

Aggregation Guide

A basic introduction to the concept of Aggregation in Gaffer can be found in the User Guide. This guide is an extension of the introduction to demonstrate more advanced usage of Aggregation and how it can be applied.

Aggregation is applied in Gaffer through an aggregation function. These can take a number of forms but the common factor between them is that they use the underlying koryphe library to provide the ElementAggregator.

Ingest Aggregation

Ingest aggregation permanently aggregates similar elements together in the Graph as they are loaded. The application of ingest aggregation is done via the Graph schema which will apply the aggregation if one of the following conditions are met:

  • An entity has the same group, vertex (e.g. ID), visibility and all groupBy property values are the same.
  • An edge has the same group, source, destination, and all groupBy property values are the same.

There are a few different use cases for applying ingest aggregation but it is largely driven by the data you have and the analysis you wish to perform. As an example, say you were expecting multiple connections of the same edge between two entities but each instance of the edge may have differing values on its properties, this could be a place to apply aggregation to sum the values etc.

Please see the ingest aggregation example for some common use cases on how this can be applied.

Query-time Aggregation

Query-time aggregation, as the name suggests, is adding aggregation to elements from within the graph query. This differs from ingest aggregation as only the results of the query will have been aggregated; the data stored in the graph remains unchanged.

Generally, to apply aggregation at query-time you must override the groupBy property to prevent the default grouping taking place. It is then possible to create your own aggregator in the query which can force the use of a different aggregation function on a property.

A simple example demonstrating query-time aggregation can be found in the user guide on filtering.

Tip

Most of the time you will want to couple query-time aggregation with a View to allow more targeted queries on the data in your graph.