Interface CountingBloomFilter

All Superinterfaces:
BitMapExtractor, BloomFilter, CellExtractor, IndexExtractor
All Known Implementing Classes:
ArrayCountingBloomFilter

The interface that describes a Bloom filter that associates a count with each bit index rather than a bit. This allows reversal of merge operations with remove operations.

A counting Bloom filter is expected to function identically to a standard Bloom filter that is the merge of all the Bloom filters that have been added to and not later subtracted from the counting Bloom filter. The functional state of a CountingBloomFilter at the start and end of a series of merge and subsequent remove operations of the same Bloom filters, irrespective of remove order, is expected to be the same.

Removal of a filter that has not previously been merged results in an invalid state where the cells no longer represent a sum of merged Bloom filters. It is impossible to validate merge and remove exactly without explicitly storing all filters. Consequently such an operation may go undetected. The CountingBloomFilter maintains a state flag that is used as a warning that an operation was performed that resulted in invalid cells and thus an invalid state. For example this may occur if a cell for an index was set to negative following a remove operation.

Implementations should document the expected state of the filter after an operation that generates invalid cells, and any potential recovery options. An implementation may support a reversal of the operation to restore the state to that prior to the operation. In the event that invalid cells are adjusted to a valid range then it should be documented if there has been irreversible information loss.

Implementations may choose to throw an exception during an operation that generates invalid cells. Implementations should document the expected state of the filter after such an operation. For example are the cells not updated, partially updated or updated entirely before the exception is raised.

Since:
4.5
See Also:
  • Method Details

    • add

      boolean add(CellExtractor other)
      Adds the specified CellExtractor to this Bloom filter.

      Specifically all cells for the indexes identified by the other will be incremented by their corresponding values in the other.

      This method will return true if the filter is valid after the operation.

      Parameters:
      other - the CellExtractor to add.
      Returns:
      true if the addition was successful and the state is valid
      See Also:
    • copy

      Creates a new instance of the CountingBloomFilter with the same properties as the current one.
      Specified by:
      copy in interface BloomFilter
      Returns:
      a copy of this CountingBloomFilter
    • getMaxCell

      int getMaxCell()
      Returns the maximum allowable value for a cell count in this Counting filter.
      Returns:
      the maximum allowable value for a cell count in this Counting filter.
    • getMaxInsert

      default int getMaxInsert(BitMapExtractor bitMapExtractor)
      Determines the maximum number of times the BitMapExtractor could have been merged into this counting filter.
      Parameters:
      bitMapExtractor - the BitMapExtractor to provide the indices.
      Returns:
      the maximum number of times the BitMapExtractor could have been inserted.
    • getMaxInsert

      default int getMaxInsert(BloomFilter bloomFilter)
      Determines the maximum number of times the Bloom filter could have been merged into this counting filter.
      Parameters:
      bloomFilter - the Bloom filter the check for.
      Returns:
      the maximum number of times the Bloom filter could have been inserted.
    • getMaxInsert

      int getMaxInsert(CellExtractor cellExtractor)
      Determines the maximum number of times the Cell Extractor could have been added.
      Parameters:
      cellExtractor - the extractor of cells.
      Returns:
      the maximum number of times the CellExtractor could have been inserted.
    • getMaxInsert

      default int getMaxInsert(Hasher hasher)
      Determines the maximum number of times the Hasher could have been merged into this counting filter.
      Parameters:
      hasher - the Hasher to provide the indices.
      Returns:
      the maximum number of times the hasher could have been inserted.
    • getMaxInsert

      default int getMaxInsert(IndexExtractor indexExtractor)
      Determines the maximum number of times the IndexExtractor could have been merged into this counting filter.

      To determine how many times an indexExtractor could have been added create a CellExtractor from the indexExtractor and check that

      Parameters:
      indexExtractor - the extractor to drive the count check.
      Returns:
      the maximum number of times the IndexExtractor could have been inserted.
      See Also:
    • isValid

      boolean isValid()
      Returns true if the internal state is valid.

      This flag is a warning that an addition or subtraction of cells from this filter resulted in an invalid cell for one or more indexes. For example this may occur if a cell for an index was set to negative following a subtraction operation, or overflows the value specified by getMaxCell() following an addition operation.

      A counting Bloom filter that has an invalid state is no longer ensured to function identically to a standard Bloom filter instance that is the merge of all the Bloom filters that have been added to and not later subtracted from this counting Bloom filter.

      Note: The change to an invalid state may or may not be reversible. Implementations are expected to document their policy on recovery from an addition or removal operation that generated an invalid state.

      Returns:
      true if the state is valid
    • merge

      default boolean merge(BitMapExtractor bitMapExtractor)
      Merges the specified BitMap extractor into this Bloom filter.

      Specifically: all cells for the indexes identified by the bitMapExtractor will be incremented by 1.

      This method will return true if the filter is valid after the operation.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      bitMapExtractor - the BitMapExtractor
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • merge

      default boolean merge(BloomFilter other)
      Merges the specified Bloom filter into this Bloom filter.

      Specifically: all cells for the indexes identified by the other filter will be incremented by 1.

      Note: If the other filter is a counting Bloom filter the other filter's cells are ignored and it is treated as an IndexExtractor.

      This method will return true if the filter is valid after the operation.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      other - the other Bloom filter
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • merge

      default boolean merge(Hasher hasher)
      Merges the specified Hasher into this Bloom filter.

      Specifically: all cells for the unique indexes identified by the hasher will be incremented by 1.

      This method will return true if the filter is valid after the operation.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      hasher - the hasher
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • merge

      default boolean merge(IndexExtractor indexExtractor)
      Merges the specified index extractor into this Bloom filter.

      Specifically: all unique cells for the indices identified by the indexExtractor will be incremented by 1.

      This method will return true if the filter is valid after the operation.

      Notes:

      • If indices that are returned multiple times should be incremented multiple times convert the IndexExtractor to a CellExtractor and add that.
      • Implementations should throw IllegalArgumentException and no other exception on bad input.
      Specified by:
      merge in interface BloomFilter
      Parameters:
      indexExtractor - the IndexExtractor
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(BitMapExtractor bitMapExtractor)
      Removes the specified BitMapExtractor from this Bloom filter.

      Specifically all cells for the indices produced by the bitMapExtractor will be decremented by 1.

      This method will return true if the filter is valid after the operation.

      Parameters:
      bitMapExtractor - the BitMapExtractor to provide the indexes
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(BloomFilter other)
      Removes the specified Bloom filter from this Bloom filter.

      Specifically: all cells for the indexes identified by the other filter will be decremented by 1.

      Note: If the other filter is a counting Bloom filter the other filter's cells are ignored and it is treated as an IndexExtractor.

      This method will return true if the filter is valid after the operation.

      Parameters:
      other - the other Bloom filter
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(Hasher hasher)
      Removes the unique values from the specified hasher from this Bloom filter.

      Specifically all cells for the unique indices produced by the hasher will be decremented by 1.

      This method will return true if the filter is valid after the operation.

      Parameters:
      hasher - the hasher to provide the indexes
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(IndexExtractor indexExtractor)
      Removes the values from the specified IndexExtractor from the Bloom filter from this Bloom filter.

      Specifically all cells for the unique indices produced by the hasher will be decremented by 1.

      This method will return true if the filter is valid after the operation.

      Note: If indices that are returned multiple times should be decremented multiple times convert the IndexExtractor to a CellExtractor and subtract that.

      Parameters:
      indexExtractor - the IndexExtractor to provide the indexes
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • subtract

      boolean subtract(CellExtractor other)
      Adds the specified CellExtractor to this Bloom filter.

      Specifically all cells for the indexes identified by the other will be decremented by their corresponding values in the other.

      This method will return true if the filter is valid after the operation.

      Parameters:
      other - the CellExtractor to subtract.
      Returns:
      true if the subtraction was successful and the state is valid
      See Also:
    • uniqueIndices

      The default implementation is a no-op since the counting bloom filter returns an unique IndexExtractor by default.
      Specified by:
      uniqueIndices in interface BloomFilter
      Specified by:
      uniqueIndices in interface CellExtractor
      Specified by:
      uniqueIndices in interface IndexExtractor
      Returns:
      this counting Bloom filter.