# Class Shape

java.lang.Object
org.apache.commons.collections4.bloomfilter.Shape

public final class Shape extends Object
The definition of a Bloom filter shape.

This class contains the values for the filter configuration and is used to convert a Hasher into a BloomFilter as well as verify that two Bloom filters are compatible. (i.e. can be compared or merged)

## Interrelatedness of values

Number of Items (`n`)
`n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))`
Probability of False Positives (`p`)
`p = pow(1 - exp(-k / (m / n)), k)`
Number of Bits (`m`)
`m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2))))`
Number of Functions (`k`)
`k = round((m / n) * ln(2))`

## Estimations from cardinality based on shape

Several estimates can be calculated from the Shape and the cardinality of a Bloom filter.

In the calculation below the following values are used:

• double c = the cardinality of the Bloom filter.
• double m = numberOfBits as specified in the shape.
• double k = numberOfHashFunctions as specified in the shape.

### Estimate N - n()

The calculation for the estimate of N is: `-(m/k) * ln(1 - (c/m))`. This is the calculation performed by the `Shape.estimateN(cardinality)` method below. This estimate is roughly equivalent to the number of hashers that have been merged into a filter to create the cardinality specified.

Note:

• if cardinality == numberOfBits, then result is infinity.
• if cardinality > numberOfBits, then result is NaN.

### Estimate N of Union - n(A ∪ B)

To estimate the number of items in the union of two Bloom filters with the same shape, merge them together and calculate the estimated N from the result.

### Estimate N of the Intersection - n(A ∩ B)

To estimate the number of items in the intersection of two Bloom filters A and B with the same shape the calculation is: n(A) + n(b) - n(A ∪ B).

Care must be taken when any of the n(x) returns infinity. In general the following assumptions are true:

• If n(A) = ∞ and n(B) < ∞ then n(A ∩ B) = n(B)
• If n(A) < ∞ and n(B) = ∞ then n(A ∩ B) = n(A)
• If n(A) = ∞ and n(B) = ∞ then n(A ∩ B) = ∞
• If n(A) < ∞ and n(B) < ∞ and n(A ∪ B) = ∞ then n(A ∩ B) is undefined.
Since:
4.5
• ## Method Summary

Modifier and Type
Method
Description
`boolean`
`equals(Object obj)`

`double`
`estimateMaxN()`
Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.
`double`
`estimateN(int cardinality)`
Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.
`static Shape`
```fromKM(int numberOfHashFunctions, int numberOfBits)```
Constructs a filter configuration with the specified number of hashFunctions (`k`) and bits (`m`).
`static Shape`
```fromNM(int numberOfItems, int numberOfBits)```
Constructs a filter configuration with the specified number of items (`n`) and bits (`m`).
`static Shape`
```fromNMK(int numberOfItems, int numberOfBits, int numberOfHashFunctions)```
Constructs a filter configuration with the specified number of items, bits and hash functions.
`static Shape`
```fromNP(int numberOfItems, double probability)```
Constructs a filter configuration with the specified number of items (`n`) and desired false-positive probability (`p`).
`static Shape`
```fromPMK(double probability, int numberOfBits, int numberOfHashFunctions)```
Constructs a filter configuration with a desired false-positive probability (`p`) and the specified number of bits (`m`) and hash functions (`k`).
`int`
`getNumberOfBits()`
Gets the number of bits in the Bloom filter.
`int`
`getNumberOfHashFunctions()`
Gets the number of hash functions used to construct the filter.
`double`
`getProbability(int numberOfItems)`
Calculates the probability of false positives (`p`) given numberOfItems (`n`), numberOfBits (`m`) and numberOfHashFunctions (`k`).
`int`
`hashCode()`

`boolean`
`isSparse(int cardinality)`
Determines if a cardinality is sparse based on the shape.
`String`
`toString()`

### Methods inherited from class java.lang.Object

`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`
• ## Method Details

• ### fromKM

public static Shape fromKM(int numberOfHashFunctions, int numberOfBits)
Constructs a filter configuration with the specified number of hashFunctions (`k`) and bits (`m`).
Parameters:
`numberOfHashFunctions` - Number of hash functions to use for each item placed in the filter.
`numberOfBits` - The number of bits in the filter
Returns:
a valid Shape.
Throws:
`IllegalArgumentException` - if `numberOfHashFunctions < 1` or `numberOfBits < 1`
• ### fromNM

public static Shape fromNM(int numberOfItems, int numberOfBits)
Constructs a filter configuration with the specified number of items (`n`) and bits (`m`).

The optimal number of hash functions (`k`) is computed.

`k = round((m / n) * ln(2))`

The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

Parameters:
`numberOfItems` - Number of items to be placed in the filter
`numberOfBits` - The number of bits in the filter
Returns:
a valid Shape.
Throws:
`IllegalArgumentException` - if `numberOfItems < 1`, `numberOfBits < 1`, the calculated number of hash function is `< 1`, or if the actual probability is `>= 1.0`
• ### fromNMK

public static Shape fromNMK(int numberOfItems, int numberOfBits, int numberOfHashFunctions)
Constructs a filter configuration with the specified number of items, bits and hash functions.

The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

Parameters:
`numberOfItems` - Number of items to be placed in the filter
`numberOfBits` - The number of bits in the filter.
`numberOfHashFunctions` - The number of hash functions in the filter
Returns:
a valid Shape.
Throws:
`IllegalArgumentException` - if `numberOfItems < 1`, `numberOfBits < 1`, `numberOfHashFunctions < 1`, or if the actual probability is `>= 1.0`.
• ### fromNP

public static Shape fromNP(int numberOfItems, double probability)
Constructs a filter configuration with the specified number of items (`n`) and desired false-positive probability (`p`).

The number of bits (`m`) for the filter is computed.

`m = ceil(n * ln(p) / ln(1 / 2^ln(2)))`

The optimal number of hash functions (`k`) is computed.

`k = round((m / n) * ln(2))`

The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated number of bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

Parameters:
`numberOfItems` - Number of items to be placed in the filter
`probability` - The desired false-positive probability in the range `(0, 1)`
Returns:
a valid Shape
Throws:
`IllegalArgumentException` - if `numberOfItems < 1`, if the desired probability is not in the range `(0, 1)` or if the actual probability is `>= 1.0`.
• ### fromPMK

public static Shape fromPMK(double probability, int numberOfBits, int numberOfHashFunctions)
Constructs a filter configuration with a desired false-positive probability (`p`) and the specified number of bits (`m`) and hash functions (`k`).

The number of items (`n`) to be stored in the filter is computed.

`n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))`

The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated Bloom filter capacity (number of items). An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

Parameters:
`probability` - The desired false-positive probability in the range `(0, 1)`
`numberOfBits` - The number of bits in the filter
`numberOfHashFunctions` - The number of hash functions in the filter
Returns:
a valid Shape.
Throws:
`IllegalArgumentException` - if the desired probability is not in the range `(0, 1)`, `numberOfBits < 1`, `numberOfHashFunctions < 1`, or the actual probability is `>= 1.0`
• ### equals

public boolean equals(Object obj)
Overrides:
`equals` in class `Object`
• ### estimateMaxN

public double estimateMaxN()
Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.

The formula for deriving `k` when `m` and `n` are known is:

`k = ln2 * m / n`

Solving for `n` yields:

`n = ln2 * m / k`

Returns:
An estimate of max N.
• ### estimateN

public double estimateN(int cardinality)
Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.

Note:

• if cardinality == numberOfBits, then result is infinity.
• if cardinality > numberOfBits, then result is NaN.
Parameters:
`cardinality` - the number of enabled bits also known as the hamming value.
Returns:
An estimate of the number of items in the Bloom filter.
• ### getNumberOfBits

public int getNumberOfBits()
Gets the number of bits in the Bloom filter. This is also known as `m`.
Returns:
the number of bits in the Bloom filter (`m`).
• ### getNumberOfHashFunctions

Gets the number of hash functions used to construct the filter. This is also known as `k`.
Returns:
the number of hash functions used to construct the filter (`k`).
• ### getProbability

public double getProbability(int numberOfItems)
Calculates the probability of false positives (`p`) given numberOfItems (`n`), numberOfBits (`m`) and numberOfHashFunctions (`k`).
`p = pow(1 - exp(-k / (m / n)), k)`

This is the probability that a Bloom filter will return true for the presence of an item when it does not contain the item.

The probability assumes that the Bloom filter is filled with the expected number of items. If the filter contains fewer items then the actual probability will be lower. Thus, this returns the worst-case false positive probability for a filter that has not exceeded its expected number of items.

Parameters:
`numberOfItems` - the number of items hashed into the Bloom filter.
Returns:
the probability of false positives.
• ### hashCode

public int hashCode()
Overrides:
`hashCode` in class `Object`
• ### isSparse

public boolean isSparse(int cardinality)
Determines if a cardinality is sparse based on the shape.

This method assumes that bit maps are 64bits and indexes are 32bits. If the memory necessary to store the cardinality as indexes is less than the estimated memory for bit maps, the cardinality is determined to be `sparse`.

Parameters:
`cardinality` - the cardinality to check.
Returns:
true if the cardinality is sparse within the shape.
• ### toString

public String toString()
Overrides:
`toString` in class `Object`