Class AccumuloIDWithinSetRetriever

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Iterable<Element>

    public class AccumuloIDWithinSetRetriever
    extends AccumuloSetRetriever<GetElementsWithinSet>
    Retrieves Edges where both ends are in a given set of EntityId's and Entitys where the vertex is in the set.

    BloomFilters are used to identify on the server edges that are likely to be between members of the set and to send only these to the client. This reduces the amount of data sent to the client.

    This operates in two modes. In the first mode the seeds are loaded into memory (client-side). They are also loaded into a BloomFilter. This is passed to the iterators to filter out all edges that are definitely not between elements of the set. A secondary check is done within this class to check that the edge is definitely between elements of the set (this defeats any false positives, i.e. edges that passed the BloomFilter check in the iterators). This secondary check uses the in memory set of seeds (and hence there are guaranteed to be no false positives returned to the user).

    In the second mode, where there are too many seeds to be loaded into memory, the seeds are queried one batch at a time. When the first batch is queried for, a BloomFilter of the first batch is created and passed to the iterators. This filters out all edges that are definitely not between elements of the first batch. When the second batch is queried for, the same BloomFilter has the second batch added to it. This is passed to the iterators, which filters out all edges that are definitely not between elements of the second batch and the first or second batch. This process repeats until all seeds have been queried for. This is best thought of as a square split into a grid (with the same number of squares in both dimensions). As there are too many seeds to load into memory, we use a client-side BloomFilter to further reduce the chances of false positives making it to the user.