Class ArrayCountingBloomFilter

java.lang.Object
org.apache.commons.collections4.bloomfilter.ArrayCountingBloomFilter
All Implemented Interfaces:
BitMapProducer, BloomFilter, CellProducer, CountingBloomFilter, IndexProducer

public final class ArrayCountingBloomFilter extends Object implements CountingBloomFilter
A counting Bloom filter using an int array to track cells for each enabled bit.

Any operation that results in negative counts or integer overflow of counts will mark this filter as invalid. This transition is not reversible. The operation is completed in full, no exception is raised and the state is set to invalid. This allows the cells for the filter immediately prior to the operation that created the invalid state to be recovered. See the documentation in isValid() for details.

All the operations in the filter assume the cells are currently valid, for example cardinality or contains operations. Behavior of an invalid filter is undefined. It will no longer function identically to a standard Bloom filter that is the merge of all the Bloom filters that have been added to and not later subtracted from the counting Bloom filter.

The maximum supported number of items that can be stored in the filter is limited by the maximum array size combined with the Shape. For example an implementation using a Shape with a false-positive probability of 1e-6 and Integer.MAX_VALUE bits can reversibly store approximately 75 million items using 20 hash functions per item with a memory consumption of approximately 8 GB.

Since:
4.5
See Also:
  • Constructor Details

    • ArrayCountingBloomFilter

      Constructs an empty counting Bloom filter with the specified shape.
      Parameters:
      shape - the shape of the filter
  • Method Details

    • add

      public boolean add(CellProducer other)
      Description copied from interface: CountingBloomFilter
      Adds the specified CellProducer to this Bloom filter.

      Specifically all cells for the indexes identified by the other will be incremented by their corresponding values in the other.

      This method will return true if the filter is valid after the operation.

      Specified by:
      add in interface CountingBloomFilter
      Parameters:
      other - the CellProducer to add.
      Returns:
      true if the addition was successful and the state is valid
      See Also:
    • asIndexArray

      public int[] asIndexArray()
      Description copied from interface: IndexProducer
      Return a copy of the IndexProducer data as an int array.

      Indices ordering and uniqueness is not guaranteed.

      The default implementation of this method creates an array and populates it. Implementations that have access to an index array should consider returning a copy of that array if possible.

      Specified by:
      asIndexArray in interface IndexProducer
      Returns:
      An int array of the data.
    • cardinality

      public int cardinality()
      Description copied from interface: BloomFilter
      Gets the cardinality (number of enabled bits) of this Bloom filter.

      This is also known as the Hamming value or Hamming number.

      Specified by:
      cardinality in interface BloomFilter
      Returns:
      the cardinality of this filter
    • characteristics

      public int characteristics()
      Description copied from interface: BloomFilter
      Returns the characteristics of the filter.

      Characteristics are defined as bits within the characteristics integer.

      Specified by:
      characteristics in interface BloomFilter
      Returns:
      the characteristics for this bloom filter.
    • clear

      public void clear()
      Description copied from interface: BloomFilter
      Resets the filter to its initial, unpopulated state.
      Specified by:
      clear in interface BloomFilter
    • contains

      public boolean contains(BitMapProducer bitMapProducer)
      Description copied from interface: BloomFilter
      Returns true if this filter contains the bits specified in the bit maps produced by the bitMapProducer.
      Specified by:
      contains in interface BloomFilter
      Parameters:
      bitMapProducer - the BitMapProducer to provide the bit maps.
      Returns:
      true if this filter is enabled for all bits specified by the bit maps
    • contains

      public boolean contains(IndexProducer indexProducer)
      Description copied from interface: BloomFilter
      Returns true if this filter contains the indices specified IndexProducer.

      Specifically this returns true if this filter is enabled for all bit indexes identified by the IndexProducer.

      Specified by:
      contains in interface BloomFilter
      Parameters:
      indexProducer - the IndexProducer to provide the indexes
      Returns:
      true if this filter is enabled for all bits specified by the IndexProducer
    • copy

      Description copied from interface: CountingBloomFilter
      Creates a new instance of the CountingBloomFilter with the same properties as the current one.
      Specified by:
      copy in interface BloomFilter
      Specified by:
      copy in interface CountingBloomFilter
      Returns:
      a copy of this CountingBloomFilter
    • forEachBitMap

      public boolean forEachBitMap(LongPredicate consumer)
      Description copied from interface: BitMapProducer
      Each bit map is passed to the predicate in order. The predicate is applied to each bit map value, if the predicate returns false the execution is stopped, false is returned, and no further bit maps are processed.

      If the producer is empty this method will return true.

      Any exceptions thrown by the action are relayed to the caller.

      Specified by:
      forEachBitMap in interface BitMapProducer
      Parameters:
      consumer - the function to execute
      Returns:
      true if all bit maps returned true, false otherwise.
    • forEachCell

      public boolean forEachCell(CellProducer.CellConsumer consumer)
      Description copied from interface: CellProducer
      Performs the given action for each cell where the cell count is non-zero.

      Some Bloom filter implementations use a count rather than a bit flag. The term Cell is used to refer to these counts.

      Any exceptions thrown by the action are relayed to the caller. The consumer is applied to each cell. If the consumer returns false the execution is stopped, false is returned, and no further pairs are processed.

      Specified by:
      forEachCell in interface CellProducer
      Parameters:
      consumer - the action to be performed for each non-zero cell.
      Returns:
      true if all cells return true from consumer, false otherwise.
    • forEachIndex

      public boolean forEachIndex(IntPredicate consumer)
      Description copied from interface: IndexProducer
      Each index is passed to the predicate. The predicate is applied to each index value, if the predicate returns false the execution is stopped, false is returned, and no further indices are processed.

      Any exceptions thrown by the action are relayed to the caller.

      Indices ordering and uniqueness is not guaranteed.

      Specified by:
      forEachIndex in interface CellProducer
      Specified by:
      forEachIndex in interface IndexProducer
      Parameters:
      consumer - the action to be performed for each non-zero bit index.
      Returns:
      true if all indexes return true from consumer, false otherwise.
    • getMaxCell

      public int getMaxCell()
      Description copied from interface: CountingBloomFilter
      Returns the maximum allowable value for a cell count in this Counting filter.
      Specified by:
      getMaxCell in interface CountingBloomFilter
      Returns:
      the maximum allowable value for a cell count in this Counting filter.
    • getMaxInsert

      public int getMaxInsert(CellProducer cellProducer)
      Description copied from interface: CountingBloomFilter
      Determines the maximum number of times the Cell Producer could have been add.
      Specified by:
      getMaxInsert in interface CountingBloomFilter
      Parameters:
      cellProducer - the producer of cells.
      Returns:
      the maximum number of times the CellProducer could have been inserted.
    • getShape

      public Shape getShape()
      Description copied from interface: BloomFilter
      Gets the shape that was used when the filter was built.
      Specified by:
      getShape in interface BloomFilter
      Returns:
      The shape the filter was built with.
    • isValid

      public boolean isValid()
      Returns true if the internal state is valid.

      This flag is a warning that an addition or subtraction of cells from this filter resulted in an invalid cell for one or more indexes. For example this may occur if a cell for an index was set to negative following a subtraction operation, or overflows the value specified by getMaxCell() following an addition operation.

      A counting Bloom filter that has an invalid state is no longer ensured to function identically to a standard Bloom filter instance that is the merge of all the Bloom filters that have been added to and not later subtracted from this counting Bloom filter.

      Note: The change to an invalid state may or may not be reversible. Implementations are expected to document their policy on recovery from an addition or removal operation that generated an invalid state.

      Implementation note

      The state transition to invalid is permanent.

      This implementation does not correct negative cells to zero or integer overflow cells to Integer.MAX_VALUE. Thus the operation that generated invalid cells can be reversed by using the complement of the original operation with the same Bloom filter. This will restore the cells to the state prior to the invalid operation. Cells can then be extracted using forEachCell(CellConsumer).

      Specified by:
      isValid in interface CountingBloomFilter
      Returns:
      true if the state is valid
    • subtract

      public boolean subtract(CellProducer other)
      Description copied from interface: CountingBloomFilter
      Adds the specified CellProducer to this Bloom filter.

      Specifically all cells for the indexes identified by the other will be decremented by their corresponding values in the other.

      This method will return true if the filter is valid after the operation.

      Specified by:
      subtract in interface CountingBloomFilter
      Parameters:
      other - the CellProducer to subtract.
      Returns:
      true if the subtraction was successful and the state is valid
      See Also: