Class Shape
This class contains the values for the filter configuration and is used to convert a Hasher into a BloomFilter as well as verify that two Bloom filters are compatible. (i.e. can be compared or merged)
Interrelatedness of values
- Number of Items (
n
) n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))
- Probability of False Positives (
p
) p = pow(1 - exp(-k / (m / n)), k)
- Number of Bits (
m
) m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2))))
- Number of Functions (
k
) k = round((m / n) * ln(2))
Estimations from cardinality based on shape
Several estimates can be calculated from the Shape and the cardinality of a Bloom filter.
In the calculation below the following values are used:
- double c = the cardinality of the Bloom filter.
- double m = numberOfBits as specified in the shape.
- double k = numberOfHashFunctions as specified in the shape.
Estimate N - n()
The calculation for the estimate of N is: -(m/k) * ln(1 - (c/m))
. This is the calculation
performed by the Shape.estimateN(cardinality)
method below. This estimate is roughly equivalent to the
number of hashers that have been merged into a filter to create the cardinality specified.
Note:
- if cardinality == numberOfBits, then result is infinity.
- if cardinality > numberOfBits, then result is NaN.
Estimate N of Union - n(A ∪ B)
To estimate the number of items in the union of two Bloom filters with the same shape, merge them together and calculate the estimated N from the result.
Estimate N of the Intersection - n(A ∩ B)
To estimate the number of items in the intersection of two Bloom filters A and B with the same shape the calculation is: n(A) + n(b) - n(A ∪ B).
Care must be taken when any of the n(x) returns infinity. In general the following assumptions are true:
- If n(A) = ∞ and n(B) < ∞ then n(A ∩ B) = n(B)
- If n(A) < ∞ and n(B) = ∞ then n(A ∩ B) = n(A)
- If n(A) = ∞ and n(B) = ∞ then n(A ∩ B) = ∞
- If n(A) < ∞ and n(B) < ∞ and n(A ∪ B) = ∞ then n(A ∩ B) is undefined.
- Since:
- 4.5
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionboolean
double
Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.double
estimateN
(int cardinality) Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.static Shape
fromKM
(int numberOfHashFunctions, int numberOfBits) Constructs a filter configuration with the specified number of hashFunctions (k
) and bits (m
).static Shape
fromNM
(int numberOfItems, int numberOfBits) Constructs a filter configuration with the specified number of items (n
) and bits (m
).static Shape
fromNMK
(int numberOfItems, int numberOfBits, int numberOfHashFunctions) Constructs a filter configuration with the specified number of items, bits and hash functions.static Shape
fromNP
(int numberOfItems, double probability) Constructs a filter configuration with the specified number of items (n
) and desired false-positive probability (p
).static Shape
fromPMK
(double probability, int numberOfBits, int numberOfHashFunctions) Constructs a filter configuration with a desired false-positive probability (p
) and the specified number of bits (m
) and hash functions (k
).int
Gets the number of bits in the Bloom filter.int
Gets the number of hash functions used to construct the filter.double
getProbability
(int numberOfItems) Calculates the probability of false positives (p
) given numberOfItems (n
), numberOfBits (m
) and numberOfHashFunctions (k
).int
hashCode()
boolean
isSparse
(int cardinality) Determines if a cardinality is sparse based on the shape.toString()
-
Method Details
-
fromKM
Constructs a filter configuration with the specified number of hashFunctions (k
) and bits (m
).- Parameters:
numberOfHashFunctions
- Number of hash functions to use for each item placed in the filter.numberOfBits
- The number of bits in the filter- Returns:
- a valid Shape.
- Throws:
IllegalArgumentException
- ifnumberOfHashFunctions < 1
ornumberOfBits < 1
-
fromNM
Constructs a filter configuration with the specified number of items (n
) and bits (m
).The optimal number of hash functions (
k
) is computed.k = round((m / n) * ln(2))
The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).
- Parameters:
numberOfItems
- Number of items to be placed in the filternumberOfBits
- The number of bits in the filter- Returns:
- a valid Shape.
- Throws:
IllegalArgumentException
- ifnumberOfItems < 1
,numberOfBits < 1
, the calculated number of hash function is< 1
, or if the actual probability is>= 1.0
-
fromNMK
Constructs a filter configuration with the specified number of items, bits and hash functions.The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).
- Parameters:
numberOfItems
- Number of items to be placed in the filternumberOfBits
- The number of bits in the filter.numberOfHashFunctions
- The number of hash functions in the filter- Returns:
- a valid Shape.
- Throws:
IllegalArgumentException
- ifnumberOfItems < 1
,numberOfBits < 1
,numberOfHashFunctions < 1
, or if the actual probability is>= 1.0
.
-
fromNP
Constructs a filter configuration with the specified number of items (n
) and desired false-positive probability (p
).The number of bits (
m
) for the filter is computed.m = ceil(n * ln(p) / ln(1 / 2^ln(2)))
The optimal number of hash functions (
k
) is computed.k = round((m / n) * ln(2))
The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated number of bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).
- Parameters:
numberOfItems
- Number of items to be placed in the filterprobability
- The desired false-positive probability in the range(0, 1)
- Returns:
- a valid Shape
- Throws:
IllegalArgumentException
- ifnumberOfItems < 1
, if the desired probability is not in the range(0, 1)
or if the actual probability is>= 1.0
.
-
fromPMK
Constructs a filter configuration with a desired false-positive probability (p
) and the specified number of bits (m
) and hash functions (k
).The number of items (
n
) to be stored in the filter is computed.n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))
The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated Bloom filter capacity (number of items). An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).
- Parameters:
probability
- The desired false-positive probability in the range(0, 1)
numberOfBits
- The number of bits in the filternumberOfHashFunctions
- The number of hash functions in the filter- Returns:
- a valid Shape.
- Throws:
IllegalArgumentException
- if the desired probability is not in the range(0, 1)
,numberOfBits < 1
,numberOfHashFunctions < 1
, or the actual probability is>= 1.0
-
equals
-
estimateMaxN
Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.The formula for deriving
k
whenm
andn
are known is:k = ln2 * m / n
Solving for
n
yields:n = ln2 * m / k
- Returns:
- An estimate of max N.
-
estimateN
Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.Note:
- if cardinality == numberOfBits, then result is infinity.
- if cardinality > numberOfBits, then result is NaN.
- Parameters:
cardinality
- the number of enabled bits also known as the hamming value.- Returns:
- An estimate of the number of items in the Bloom filter.
-
getNumberOfBits
Gets the number of bits in the Bloom filter. This is also known asm
.- Returns:
- the number of bits in the Bloom filter (
m
).
-
getNumberOfHashFunctions
Gets the number of hash functions used to construct the filter. This is also known ask
.- Returns:
- the number of hash functions used to construct the filter (
k
).
-
getProbability
Calculates the probability of false positives (p
) given numberOfItems (n
), numberOfBits (m
) and numberOfHashFunctions (k
).p = pow(1 - exp(-k / (m / n)), k)
This is the probability that a Bloom filter will return true for the presence of an item when it does not contain the item.
The probability assumes that the Bloom filter is filled with the expected number of items. If the filter contains fewer items then the actual probability will be lower. Thus, this returns the worst-case false positive probability for a filter that has not exceeded its expected number of items.
- Parameters:
numberOfItems
- the number of items hashed into the Bloom filter.- Returns:
- the probability of false positives.
-
hashCode
-
isSparse
Determines if a cardinality is sparse based on the shape.This method assumes that bit maps are 64bits and indexes are 32bits. If the memory necessary to store the cardinality as indexes is less than the estimated memory for bit maps, the cardinality is determined to be
sparse
.- Parameters:
cardinality
- the cardinality to check.- Returns:
- true if the cardinality is sparse within the shape.
-
toString
-