Interface BloomFilter

All Superinterfaces:
BitMapExtractor, IndexExtractor
All Known Subinterfaces:
CountingBloomFilter
All Known Implementing Classes:
ArrayCountingBloomFilter, LayeredBloomFilter, SimpleBloomFilter, SparseBloomFilter, WrappedBloomFilter

public interface BloomFilter extends IndexExtractor, BitMapExtractor
The interface that describes a Bloom filter.

See implementation notes for BitMapExtractor and IndexExtractor.

Since:
4.5
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    The sparse characteristic used to determine the best method for matching.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    Gets the cardinality (number of enabled bits) of this Bloom filter.
    int
    Returns the characteristics of the filter.
    void
    Resets the filter to its initial, unpopulated state.
    default boolean
    contains(BitMapExtractor bitMapExtractor)
    Returns true if this filter contains the bits specified in the bit maps produced by the bitMapExtractor.
    default boolean
    Returns true if this filter contains the specified filter.
    default boolean
    contains(Hasher hasher)
    Returns true if this filter contains the bits specified in the hasher.
    boolean
    contains(IndexExtractor indexExtractor)
    Returns true if this filter contains the indices specified IndexExtractor.
    <T extends BloomFilter>
    T
    Creates a new instance of the BloomFilter with the same properties as the current one.
    default int
    Estimates the number of items in the intersection of this Bloom filter with the other bloom filter.
    default int
    Estimates the number of items in the Bloom filter.
    default int
    Estimates the number of items in the union of this Bloom filter with the other bloom filter.
    Gets the shape that was used when the filter was built.
    default boolean
    Determines if all the bits are off.
    default boolean
    Determines if the bloom filter is "full".
    boolean
    merge(BitMapExtractor bitMapExtractor)
    Merges the specified hasher into this Bloom filter.
    default boolean
    Merges the specified Bloom filter into this Bloom filter.
    default boolean
    merge(Hasher hasher)
    Merges the specified hasher into this Bloom filter.
    boolean
    merge(IndexExtractor indexExtractor)
    Merges the specified IndexExtractor into this Bloom filter.
    Most Bloom filters create unique IndexExtractors.

    Methods inherited from interface org.apache.commons.collections4.bloomfilter.BitMapExtractor

    asBitMapArray, processBitMapPairs, processBitMaps

    Methods inherited from interface org.apache.commons.collections4.bloomfilter.IndexExtractor

    asIndexArray, processIndices
  • Field Details

    • SPARSE

      static final int SPARSE
      The sparse characteristic used to determine the best method for matching.

      For `sparse` implementations the forEachIndex(IntConsumer consumer) method is more efficient. For non `sparse` implementations the forEachBitMap(LongConsumer consumer) is more efficient. Implementers should determine if it is easier for the implementation to produce indexes of bit map blocks.

      See Also:
  • Method Details

    • cardinality

      Gets the cardinality (number of enabled bits) of this Bloom filter.

      This is also known as the Hamming value or Hamming number.

      Returns:
      the cardinality of this filter
    • characteristics

      Returns the characteristics of the filter.

      Characteristics are defined as bits within the characteristics integer.

      Returns:
      the characteristics for this bloom filter.
    • clear

      void clear()
      Resets the filter to its initial, unpopulated state.
    • contains

      default boolean contains(BitMapExtractor bitMapExtractor)
      Returns true if this filter contains the bits specified in the bit maps produced by the bitMapExtractor.
      Parameters:
      bitMapExtractor - the BitMapExtractor to provide the bit maps.
      Returns:
      true if this filter is enabled for all bits specified by the bit maps
    • contains

      default boolean contains(BloomFilter other)
      Returns true if this filter contains the specified filter.

      Specifically this returns true if this filter is enabled for all bits that are enabled in the other filter. Using the bit representations this is effectively (this AND other) == other.

      Parameters:
      other - the other Bloom filter
      Returns:
      true if all enabled bits in the other filter are enabled in this filter.
    • contains

      default boolean contains(Hasher hasher)
      Returns true if this filter contains the bits specified in the hasher.

      Specifically this returns true if this filter is enabled for all bit indexes identified by the hasher. Using the bit map representations this is effectively (this AND hasher) == hasher.

      Parameters:
      hasher - the hasher to provide the indexes
      Returns:
      true if this filter is enabled for all bits specified by the hasher
    • contains

      boolean contains(IndexExtractor indexExtractor)
      Returns true if this filter contains the indices specified IndexExtractor.

      Specifically this returns true if this filter is enabled for all bit indexes identified by the IndexExtractor.

      Parameters:
      indexExtractor - the IndexExtractor to provide the indexes
      Returns:
      true if this filter is enabled for all bits specified by the IndexExtractor
    • copy

      <T extends BloomFilter> T copy()
      Creates a new instance of the BloomFilter with the same properties as the current one.
      Type Parameters:
      T - Type of BloomFilter.
      Returns:
      a copy of this BloomFilter
    • estimateIntersection

      default int estimateIntersection(BloomFilter other)
      Estimates the number of items in the intersection of this Bloom filter with the other bloom filter.

      This method produces estimate is roughly equivalent to the number of unique Hashers that have been merged into both of the filters by rounding the value from the calculation described in the Shape class Javadoc.

      estimateIntersection should only be called with Bloom filters of the same Shape. If called on Bloom filters of differing shape this method is not symmetric. If other has more bits an IllegalArgumentException may be thrown.

      Parameters:
      other - The other Bloom filter
      Returns:
      an estimate of the number of items in the intersection. If the calculated estimate is larger than Integer.MAX_VALUE then MAX_VALUE is returned.
      Throws:
      IllegalArgumentException - if the estimated N for the union of the filters is infinite.
      See Also:
    • estimateN

      default int estimateN()
      Estimates the number of items in the Bloom filter.

      By default this is the rounding of the Shape.estimateN(cardinality) calculation for the shape and cardinality of this filter.

      This produces an estimate roughly equivalent to the number of Hashers that have been merged into the filter by rounding the value from the calculation described in the Shape class Javadoc.

      Note:

      • if cardinality == numberOfBits, then result is Integer.MAX_VALUE.
      • if cardinality > numberOfBits, then an IllegalArgumentException is thrown.
      Returns:
      an estimate of the number of items in the bloom filter. Will return Integer.MAX_VALUE if the estimate is larger than Integer.MAX_VALUE.
      Throws:
      IllegalArgumentException - if the cardinality is > numberOfBits as defined in Shape.
      See Also:
    • estimateUnion

      default int estimateUnion(BloomFilter other)
      Estimates the number of items in the union of this Bloom filter with the other bloom filter.

      This produces an estimate roughly equivalent to the number of unique Hashers that have been merged into either of the filters by rounding the value from the calculation described in the Shape class Javadoc.

      estimateUnion should only be called with Bloom filters of the same Shape. If called on Bloom filters of differing shape this method is not symmetric. If other has more bits an IllegalArgumentException may be thrown.

      Parameters:
      other - The other Bloom filter
      Returns:
      an estimate of the number of items in the union. Will return Integer.MAX_VALUE if the estimate is larger than Integer.MAX_VALUE.
      See Also:
    • getShape

      Gets the shape that was used when the filter was built.
      Returns:
      The shape the filter was built with.
    • isEmpty

      default boolean isEmpty()
      Determines if all the bits are off. This is equivalent to cardinality() == 0.

      Note: This method is optimised for non-sparse filters. Implementers are encouraged to implement faster checks if possible.

      Returns:
      true if no bits are enabled, false otherwise.
    • isFull

      default boolean isFull()
      Determines if the bloom filter is "full".

      Full is defined as having no unset bits.

      Returns:
      true if the filter is full, false otherwise.
    • merge

      boolean merge(BitMapExtractor bitMapExtractor)
      Merges the specified hasher into this Bloom filter. Specifically all bit indexes that are identified by the bitMapExtractor will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain all the indexes enabled in the bitMapExtractor. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Parameters:
      bitMapExtractor - The BitMapExtractor to merge.
      Returns:
      true if the merge was successful
      Throws:
      IllegalArgumentException - if bitMapExtractor sends illegal value.
    • merge

      default boolean merge(BloomFilter other)
      Merges the specified Bloom filter into this Bloom filter.

      Specifically all bit indexes that are identified by the other will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain the other Bloom filter. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Parameters:
      other - The bloom filter to merge into this one.
      Returns:
      true if the merge was successful
    • merge

      default boolean merge(Hasher hasher)
      Merges the specified hasher into this Bloom filter. Specifically all bit indexes that are identified by the hasher will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain the hasher values. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Parameters:
      hasher - The hasher to merge.
      Returns:
      true if the merge was successful
      Throws:
      IllegalArgumentException - if hasher produces an illegal value.
    • merge

      boolean merge(IndexExtractor indexExtractor)
      Merges the specified IndexExtractor into this Bloom filter. Specifically all bit indexes that are identified by the indexExtractor will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain all the indexes of the indexExtractor. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Parameters:
      indexExtractor - The IndexExtractor to merge.
      Returns:
      true if the merge was successful
      Throws:
      IllegalArgumentException - if indexExtractor sends illegal value.
    • uniqueIndices

      Most Bloom filters create unique IndexExtractors.
      Specified by:
      uniqueIndices in interface IndexExtractor
      Returns:
      the IndexExtractor of unique values.