Skip to content

Federated Store Changes

This page contains information on the changes to Gaffer's Federated Store. This functionality was introduced in version 2.0.0-alpha-0.4 of Gaffer.
The main changes were the addition of the Federated Operation, and a change to how results are merged by default.

The Federated Operation

The FederatedOperationChain was removed and replaced with a new Operation, the FederatedOperation. This was added to improve the control you have over how operations are federated.
The Federated Operation has 3 key parameters: operation, graphIds and mergeFunction:

{
    "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
    "operation": {
        "class": "uk.gov.gchq.gaffer.operation.impl.get.GetAllElements"
    },
    "graphIds": [ "graphA", "graphB" ],
    "mergeFunction": {
        "class": "uk.gov.gchq.gaffer.federatedstore.util.ConcatenateMergeFunction"
    }
}

Required parameter: operation

This is the Operation you wish to be federated to the subgraphs. This can be a single Operation or an OperationChain. If you use an OperationChain, then the whole chain will be sent to the subgraphs.

Optional parameter: graphIds

This is a list of graph IDs which you want to send the operation to.

If the user does not specify graphIds in the Operation, then the storeConfiguredGraphIds for that store will be used. If the admin has not configured the storeConfiguredGraphIds then all graphIds will be used.

For information on sending different operations in one chain to different subgraphs, see below.

Optional parameter: mergeFunction

The mergeFunction parameter is the Function you want to use when merging the results from the subgraphs.

If the user does not specify a mergeFunction then it will be selected from the storeConfiguredMergeFunctions for that store. If the admin has not configured the storeConfiguredMergeFunctions, it will contain pre-populated mergeFunctions. Lastly, if a suitable mergeFunction is not found then a default ConcatenateMergeFunction is used.

For example, when GetElements is used as the operation inside a FederatedOperation and the user hasn't specified a mergeFunction, the pre-populated ApplyViewToElementsFunction will be selected from storeConfiguredMergeFunctions, unless the admin configured it to use something else.

Migrating to a FederatedOperation

Previously, graphIds were selected in queries with the now deprecated option: gaffer.federatedstore.operation.graphIds. This is being supported while users migrate to using a FederatedOperation.

Sending an Operation to specific stores

As mentioned, the gaffer.federatedstore.operation.graphIds option is still being supported so if you have an Operation using that option, it will continue to work.
Despite the option still being supported, we recommend you migrate to using a FederatedOperation.

The gaffer.federatedstore.operation.graphIds option does not work an OperationChain. Previously, if you wanted to send an entire OperationChain to specific graphs, then you had to use a FederatedOperationChain. This has been replaced by a FederatedOperation with an OperationChain as the payload. For migration, see below.

Deprecated graphIds option on a single Operation

{
    "class": "uk.gov.gchq.gaffer.operation.impl.get.GetAllElements",
    "options": {
        "gaffer.federatedstore.operation.graphIds": "graphA"
    }
}

New FederatedOperation graphIds on a single Operation

{
    "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
    "operation": {
        "class": "uk.gov.gchq.gaffer.operation.impl.get.GetAllElements"
    },
    "graphIds": [ "graphA" ]
}

Deprecated graphIds option inside an OperationChain

{
    "class": "uk.gov.gchq.gaffer.operation.OperationChain",
    "operations": [
        {
            "class": "ExampleOperation1",
            "options": {
                "gaffer.federatedstore.operation.graphIds": "graphA"
            }
        },
        {
            "class": "ExampleOperation2",
            "options": {
                "gaffer.federatedstore.operation.graphIds": "graphB"
            }
        }
    ]
}

New FederatedOperation graphIds inside an OperationChain

{
    "class": "uk.gov.gchq.gaffer.operation.OperationChain",
    "operations": [
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "ExampleOperation1"
            },
            "graphIds": [ "graphA" ]
        },
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "ExampleOperation2"
            },
            "graphIds": [ "graphB" ]
        }
    ]
}

Breaking change: Removal of FederatedOperationChain

The FederatedOperationChain has been removed, and where you would have used it before you should instead use a FederatedOperation with an OperationChain inside.

This is useful if you have an OperationChain and want to send different parts of the chain to different subgraphs.

Individually sending a sequence of Operations to a subgraph

You could send a sequence of operations within one chain to the same subgraph using graphIds, however, this is not always efficient:

{
    "class": "uk.gov.gchq.gaffer.operation.OperationChain",
    "operations": [
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "ExampleOperation1"
            },
            "graphIds": [ "graphA" ]
        },
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "ExampleOperation2"
            },
            "graphIds": [ "graphA" ]
        },
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "ExampleOperation3"
            },
            "graphIds": [ "graphB" ]
        }
    ]
}

Removed FederatedOperationChain sending a sequence of operations to a subgraph

It is more efficient to group together sequences of Operations that will go to the same subgraph. This used to be done with a FederatedOperationChain:

{
    "class": "uk.gov.gchq.gaffer.operation.OperationChain",
    "operations": [
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperationChain",
            "operations": {
                [
                    "class": "ExampleOperation1",
                    "class": "ExampleOperation2"
                ]
            },
            "options": {
                "gaffer.federatedstore.operation.graphIds": "graphA"
            }
        },
        {
            "class": "ExampleOperation3",
            "options": {
                "gaffer.federatedstore.operation.graphIds": "graphB"
            }
        }
    ]
}

New FederatedOperation sending a sequence of operations to a subgraph

Now you should instead wrap an OperationChain inside a FederatedOperation:

{
    "class": "uk.gov.gchq.gaffer.operation.OperationChain",
    "operations": [
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "uk.gov.gchq.gaffer.operation.OperationChain",
                "operations": {
                    [
                        "class": "ExampleOperation1",
                        "class": "ExampleOperation2"
                    ]
                }
            },
            "graphIds": [ "graphA" ]
        },
        {
            "class": "uk.gov.gchq.gaffer.federatedstore.operation.FederatedOperation",
            "operation": {
                "class": "ExampleOperation3"
            },
            "graphIds": [ "graphB" ]
        }
    ]
}

Default results merging

As described above, FederatedStores now have storeConfiguredMergeFunctions that dictate how the FederatedStore will merge results from different subgraphs dependent on the Operation.

In places, these new defaults do differ from previous behaviour, hence results will too. This can be overriden on a per Operation basis using the mergeFunction parameter described above, or a per store basis by overriding storeConfiguredMergeFunctions.
The previous behaviour was that all Operation results were concatenated together, this is now a mergeFunction within Gaffer called ConcatenateMergeFunction. Therefore, if you wanted a FederatedOperation to use this old behaviour, you can set the mergeFunction to ConcatenateMergeFunction (as shown above).

New Merge function examples

By default, GetElements results will be merged with ApplyViewToElementsFunction. This uses the View from the operation and applies it to all of the results, meaning the results are now re-aggregated and re-filtered using the Schema, locally in the FederatedStore. This makes the results look like they came from one graph, rather than getting back a list of Elements from different subgraphs.

By default, GetTraits results will be merged with CollectionIntersect. This returns the intersection of common store traits from the subgraphs. This behaviour is the same, but now it can be overriden.

By default, GetSchema results will be merged with MergeSchema. This returns an aggregated schema from the subgraphs, unless there is a conflict. This behaviour is the same, but now it can be overriden. For example, you may wish to use the ConcatenateMergeFunction if there is a schema conflict.

Default storeConfiguredMergeFunctions

Operation Merge function
GetElements ApplyViewToElementsFunction
GetAllElements ApplyViewToElementsFunction
GetSchema MergeSchema
GetTraits CollectionIntersect
others ConcatenateMergeFunction

Cache Name Suffixes

Gaffer Caches now include suffixes in the names of cache entries. This allows for multiple cache entries for different graphs to co-exist using the same cache implementation instance without any conflicts.

These suffixes can be customised which allows for graphs to share the same cache entries if desired. This only applies if the relevant graphs are all configured to use the same cache instance. For example, load balancing Federated Store instances sharing same set of sub-graphs, or a shared cache entry for Named Operations allowing multiple graphs to use the same set of these operations.

If you are upgrading from Gaffer 1.x, then you may need to examine how you currently use caches and whether any of your graphs rely on sharing the same cache (e.g. sharing Named Operations between Federated Store sub-graphs).

For details on configuring cache suffixes, see the cache section of the Store Guide.


Last update: October 9, 2023
Created: January 13, 2023