Class AccumuloIDBetweenSetsRetriever

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Iterable<Element>

    public class AccumuloIDBetweenSetsRetriever
    extends AccumuloSetRetriever<GetElementsBetweenSets>
    Given two sets of EntityIds, called A and B, this retrieves all Edges where one end is in set A and the other is in set B and also returns Entitys for EntityIds in set A.

    This is done by querying for set A, and uses a BloomFilters in a filtering iterator to identify edges that are likely to be between a member of set A and a member of set B. Only these edges are returned to the client, and this reduces the amount of data sent to the client.

    This operates in two modes. In the first mode the seeds from both sets A and B are loaded into memory (client-side). The seeds from set B are loaded into a BloomFilter. This is passed to the iterators to filter out all edges for which the non-query end is definitely not in set B. A secondary check is done within this class to check that the edge is definitely between elements of the set (this defeats any false positives, i.e. edges that passed the BloomFilter check in the iterators). This secondary check uses the in memory set of seeds (and hence there are guaranteed to be no false positives returned to the user).

    In the second mode, where there are too many seeds to be loaded into memory, the seeds in set A are queried for in batches. The seeds in set B are loaded into two BloomFilters. The first of these is relatively small and is passed to the filtering iterator to filter out edges that are definitely not to set B. The second, larger, BloomFilter is used client-side to further reduce the chances of false positives making it to the user.