Class BloomFilterUtils


  • public final class BloomFilterUtils
    extends Object
    Utilities for the creation of Bloom Filters
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static int calculateBloomFilterSize​(double falsePositiveRate, int numItemsToBeAdded, int maximumSize)
      Calculates the size of the BloomFilter needed to achieve the desired false positive rate given that the specified number of items will be added to the set, but with the maximum size limited as specified.
      static int calculateNumHashes​(int bloomFilterSize, int numItemsToBeAdded)
      Calculates the optimal number of hash functions to use in a BloomFilter of the given size, to which the given number of items will be added.
      static org.apache.hadoop.util.bloom.BloomFilter getBloomFilter​(double falsePositiveRate, int numItemsToBeAdded, int maximumSize)
      Returns a BloomFilter of the necessary size to achieve the given false positive rate (subject to the given maximum size), configured with the optimal number of hash functions.
      static org.apache.hadoop.util.bloom.BloomFilter getBloomFilter​(int size)
      Returns a BloomFilter of the given size.
    • Method Detail

      • calculateBloomFilterSize

        public static int calculateBloomFilterSize​(double falsePositiveRate,
                                                   int numItemsToBeAdded,
                                                   int maximumSize)
        Calculates the size of the BloomFilter needed to achieve the desired false positive rate given that the specified number of items will be added to the set, but with the maximum size limited as specified.
        Parameters:
        falsePositiveRate - the false positive rate
        numItemsToBeAdded - the number of items to be added
        maximumSize - the maximum size
        Returns:
        An Integer representing the size of the bloom filter needed.
      • calculateNumHashes

        public static int calculateNumHashes​(int bloomFilterSize,
                                             int numItemsToBeAdded)
        Calculates the optimal number of hash functions to use in a BloomFilter of the given size, to which the given number of items will be added.
        Parameters:
        bloomFilterSize - the size of the bloom filter
        numItemsToBeAdded - the number of items to be added
        Returns:
        An integer representing the optimal number of hashes to use
      • getBloomFilter

        public static org.apache.hadoop.util.bloom.BloomFilter getBloomFilter​(double falsePositiveRate,
                                                                              int numItemsToBeAdded,
                                                                              int maximumSize)
        Returns a BloomFilter of the necessary size to achieve the given false positive rate (subject to the given maximum size), configured with the optimal number of hash functions.
        Parameters:
        falsePositiveRate - the false positive rate
        numItemsToBeAdded - the number of items to be added
        maximumSize - the maximum size
        Returns:
        A new BloomFilter with the desired Settings
      • getBloomFilter

        public static org.apache.hadoop.util.bloom.BloomFilter getBloomFilter​(int size)
        Returns a BloomFilter of the given size.
        Parameters:
        size - the size of the bloom filter to create
        Returns:
        A new BloomFilter of the desired size