java.lang.Object
org.apache.commons.collections4.bloomfilter.Shape

public final class Shape extends Object
The definition of a Bloom filter shape.

This class contains the values for the filter configuration and is used to convert a Hasher into a BloomFilter as well as verify that two Bloom filters are compatible. (i.e. can be compared or merged)

Interrelatedness of values

Number of Items (n)
n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))
Probability of False Positives (p)
p = pow(1 - exp(-k / (m / n)), k)
Number of Bits (m)
m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2))))
Number of Functions (k)
k = round((m / n) * ln(2))

Estimations from cardinality based on shape

Several estimates can be calculated from the Shape and the cardinality of a Bloom filter.

In the calculation below the following values are used:

  • double c = the cardinality of the Bloom filter.
  • double m = numberOfBits as specified in the shape.
  • double k = numberOfHashFunctions as specified in the shape.

Estimate N - n()

The calculation for the estimate of N is: -(m/k) * ln(1 - (c/m)). This is the calculation performed by the Shape.estimateN(cardinality) method below. This estimate is roughly equivalent to the number of hashers that have been merged into a filter to create the cardinality specified.

Note:

  • if cardinality == numberOfBits, then result is infinity.
  • if cardinality > numberOfBits, then result is NaN.

Estimate N of Union - n(A ∪ B)

To estimate the number of items in the union of two Bloom filters with the same shape, merge them together and calculate the estimated N from the result.

Estimate N of the Intersection - n(A ∩ B)

To estimate the number of items in the intersection of two Bloom filters A and B with the same shape the calculation is: n(A) + n(b) - n(A ∪ B).

Care must be taken when any of the n(x) returns infinity. In general the following assumptions are true:

  • If n(A) = ∞ and n(B) < ∞ then n(A ∩ B) = n(B)
  • If n(A) < ∞ and n(B) = ∞ then n(A ∩ B) = n(A)
  • If n(A) = ∞ and n(B) = ∞ then n(A ∩ B) = ∞
  • If n(A) < ∞ and n(B) < ∞ and n(A ∪ B) = ∞ then n(A ∩ B) is undefined.
Since:
4.5
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
     
    double
    Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.
    double
    estimateN(int cardinality)
    Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.
    static Shape
    fromKM(int numberOfHashFunctions, int numberOfBits)
    Constructs a filter configuration with the specified number of hashFunctions (k) and bits (m).
    static Shape
    fromNM(int numberOfItems, int numberOfBits)
    Constructs a filter configuration with the specified number of items (n) and bits (m).
    static Shape
    fromNMK(int numberOfItems, int numberOfBits, int numberOfHashFunctions)
    Constructs a filter configuration with the specified number of items, bits and hash functions.
    static Shape
    fromNP(int numberOfItems, double probability)
    Constructs a filter configuration with the specified number of items (n) and desired false-positive probability (p).
    static Shape
    fromPMK(double probability, int numberOfBits, int numberOfHashFunctions)
    Constructs a filter configuration with a desired false-positive probability (p) and the specified number of bits (m) and hash functions (k).
    int
    Gets the number of bits in the Bloom filter.
    int
    Gets the number of hash functions used to construct the filter.
    double
    getProbability(int numberOfItems)
    Calculates the probability of false positives (p) given numberOfItems (n), numberOfBits (m) and numberOfHashFunctions (k).
    int
     
    boolean
    isSparse(int cardinality)
    Determines if a cardinality is sparse based on the shape.
     

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Method Details

    • fromKM

      public static Shape fromKM(int numberOfHashFunctions, int numberOfBits)
      Constructs a filter configuration with the specified number of hashFunctions (k) and bits (m).
      Parameters:
      numberOfHashFunctions - Number of hash functions to use for each item placed in the filter.
      numberOfBits - The number of bits in the filter
      Returns:
      a valid Shape.
      Throws:
      IllegalArgumentException - if numberOfHashFunctions < 1 or numberOfBits < 1
    • fromNM

      public static Shape fromNM(int numberOfItems, int numberOfBits)
      Constructs a filter configuration with the specified number of items (n) and bits (m).

      The optimal number of hash functions (k) is computed.

      k = round((m / n) * ln(2))

      The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

      Parameters:
      numberOfItems - Number of items to be placed in the filter
      numberOfBits - The number of bits in the filter
      Returns:
      a valid Shape.
      Throws:
      IllegalArgumentException - if numberOfItems < 1, numberOfBits < 1, the calculated number of hash function is < 1, or if the actual probability is >= 1.0
    • fromNMK

      public static Shape fromNMK(int numberOfItems, int numberOfBits, int numberOfHashFunctions)
      Constructs a filter configuration with the specified number of items, bits and hash functions.

      The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

      Parameters:
      numberOfItems - Number of items to be placed in the filter
      numberOfBits - The number of bits in the filter.
      numberOfHashFunctions - The number of hash functions in the filter
      Returns:
      a valid Shape.
      Throws:
      IllegalArgumentException - if numberOfItems < 1, numberOfBits < 1, numberOfHashFunctions < 1, or if the actual probability is >= 1.0.
    • fromNP

      public static Shape fromNP(int numberOfItems, double probability)
      Constructs a filter configuration with the specified number of items (n) and desired false-positive probability (p).

      The number of bits (m) for the filter is computed.

      m = ceil(n * ln(p) / ln(1 / 2^ln(2)))

      The optimal number of hash functions (k) is computed.

      k = round((m / n) * ln(2))

      The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated number of bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

      Parameters:
      numberOfItems - Number of items to be placed in the filter
      probability - The desired false-positive probability in the range (0, 1)
      Returns:
      a valid Shape
      Throws:
      IllegalArgumentException - if numberOfItems < 1, if the desired probability is not in the range (0, 1) or if the actual probability is >= 1.0.
    • fromPMK

      public static Shape fromPMK(double probability, int numberOfBits, int numberOfHashFunctions)
      Constructs a filter configuration with a desired false-positive probability (p) and the specified number of bits (m) and hash functions (k).

      The number of items (n) to be stored in the filter is computed.

      n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))

      The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated Bloom filter capacity (number of items). An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

      Parameters:
      probability - The desired false-positive probability in the range (0, 1)
      numberOfBits - The number of bits in the filter
      numberOfHashFunctions - The number of hash functions in the filter
      Returns:
      a valid Shape.
      Throws:
      IllegalArgumentException - if the desired probability is not in the range (0, 1), numberOfBits < 1, numberOfHashFunctions < 1, or the actual probability is >= 1.0
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • estimateMaxN

      public double estimateMaxN()
      Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.

      The formula for deriving k when m and n are known is:

      k = ln2 * m / n

      Solving for n yields:

      n = ln2 * m / k

      Returns:
      An estimate of max N.
    • estimateN

      public double estimateN(int cardinality)
      Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.

      Note:

      • if cardinality == numberOfBits, then result is infinity.
      • if cardinality > numberOfBits, then result is NaN.
      Parameters:
      cardinality - the number of enabled bits also known as the hamming value.
      Returns:
      An estimate of the number of items in the Bloom filter.
    • getNumberOfBits

      public int getNumberOfBits()
      Gets the number of bits in the Bloom filter. This is also known as m.
      Returns:
      the number of bits in the Bloom filter (m).
    • getNumberOfHashFunctions

      Gets the number of hash functions used to construct the filter. This is also known as k.
      Returns:
      the number of hash functions used to construct the filter (k).
    • getProbability

      public double getProbability(int numberOfItems)
      Calculates the probability of false positives (p) given numberOfItems (n), numberOfBits (m) and numberOfHashFunctions (k).
      p = pow(1 - exp(-k / (m / n)), k)

      This is the probability that a Bloom filter will return true for the presence of an item when it does not contain the item.

      The probability assumes that the Bloom filter is filled with the expected number of items. If the filter contains fewer items then the actual probability will be lower. Thus, this returns the worst-case false positive probability for a filter that has not exceeded its expected number of items.

      Parameters:
      numberOfItems - the number of items hashed into the Bloom filter.
      Returns:
      the probability of false positives.
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • isSparse

      public boolean isSparse(int cardinality)
      Determines if a cardinality is sparse based on the shape.

      This method assumes that bit maps are 64bits and indexes are 32bits. If the memory necessary to store the cardinality as indexes is less than the estimated memory for bit maps, the cardinality is determined to be sparse.

      Parameters:
      cardinality - the cardinality to check.
      Returns:
      true if the cardinality is sparse within the shape.
    • toString

      public String toString()
      Overrides:
      toString in class Object