Interface CountingBloomFilter

All Superinterfaces:
BitMapProducer, BloomFilter, CellProducer, IndexProducer
All Known Implementing Classes:
ArrayCountingBloomFilter

public interface CountingBloomFilter extends BloomFilter, CellProducer
The interface that describes a Bloom filter that associates a count with each bit index rather than a bit. This allows reversal of merge operations with remove operations.

A counting Bloom filter is expected to function identically to a standard Bloom filter that is the merge of all the Bloom filters that have been added to and not later subtracted from the counting Bloom filter. The functional state of a CountingBloomFilter at the start and end of a series of merge and subsequent remove operations of the same Bloom filters, irrespective of remove order, is expected to be the same.

Removal of a filter that has not previously been merged results in an invalid state where the cells no longer represent a sum of merged Bloom filters. It is impossible to validate merge and remove exactly without explicitly storing all filters. Consequently such an operation may go undetected. The CountingBloomFilter maintains a state flag that is used as a warning that an operation was performed that resulted in invalid cells and thus an invalid state. For example this may occur if a cell for an index was set to negative following a remove operation.

Implementations should document the expected state of the filter after an operation that generates invalid cells, and any potential recovery options. An implementation may support a reversal of the operation to restore the state to that prior to the operation. In the event that invalid cells are adjusted to a valid range then it should be documented if there has been irreversible information loss.

Implementations may choose to throw an exception during an operation that generates invalid cells. Implementations should document the expected state of the filter after such an operation. For example are the cells not updated, partially updated or updated entirely before the exception is raised.

Since:
4.5
See Also:
  • Method Details

    • add

      boolean add(CellProducer other)
      Adds the specified CellProducer to this Bloom filter.

      Specifically all cells for the indexes identified by the other will be incremented by their corresponding values in the other.

      This method will return true if the filter is valid after the operation.

      Parameters:
      other - the CellProducer to add.
      Returns:
      true if the addition was successful and the state is valid
      See Also:
    • copy

      Creates a new instance of the CountingBloomFilter with the same properties as the current one.
      Specified by:
      copy in interface BloomFilter
      Returns:
      a copy of this CountingBloomFilter
    • getMaxCell

      int getMaxCell()
      Returns the maximum allowable value for a cell count in this Counting filter.
      Returns:
      the maximum allowable value for a cell count in this Counting filter.
    • getMaxInsert

      default int getMaxInsert(BitMapProducer bitMapProducer)
      Determines the maximum number of times the BitMapProducer could have been merged into this counting filter.
      Parameters:
      bitMapProducer - the BitMapProducer to provide the indices.
      Returns:
      the maximum number of times the BitMapProducer could have been inserted.
    • getMaxInsert

      default int getMaxInsert(BloomFilter bloomFilter)
      Determines the maximum number of times the Bloom filter could have been merged into this counting filter.
      Parameters:
      bloomFilter - the Bloom filter the check for.
      Returns:
      the maximum number of times the Bloom filter could have been inserted.
    • getMaxInsert

      int getMaxInsert(CellProducer cellProducer)
      Determines the maximum number of times the Cell Producer could have been add.
      Parameters:
      cellProducer - the producer of cells.
      Returns:
      the maximum number of times the CellProducer could have been inserted.
    • getMaxInsert

      default int getMaxInsert(Hasher hasher)
      Determines the maximum number of times the Hasher could have been merged into this counting filter.
      Parameters:
      hasher - the Hasher to provide the indices.
      Returns:
      the maximum number of times the hasher could have been inserted.
    • getMaxInsert

      default int getMaxInsert(IndexProducer idxProducer)
      Determines the maximum number of times the IndexProducer could have been merged into this counting filter.

      To determine how many times an indxProducer could have been added create a CellProducer from the indexProducer and check that

      Parameters:
      idxProducer - the producer to drive the count check.
      Returns:
      the maximum number of times the IndexProducer could have been inserted.
      See Also:
    • isValid

      boolean isValid()
      Returns true if the internal state is valid.

      This flag is a warning that an addition or subtraction of cells from this filter resulted in an invalid cell for one or more indexes. For example this may occur if a cell for an index was set to negative following a subtraction operation, or overflows the value specified by getMaxCell() following an addition operation.

      A counting Bloom filter that has an invalid state is no longer ensured to function identically to a standard Bloom filter instance that is the merge of all the Bloom filters that have been added to and not later subtracted from this counting Bloom filter.

      Note: The change to an invalid state may or may not be reversible. Implementations are expected to document their policy on recovery from an addition or removal operation that generated an invalid state.

      Returns:
      true if the state is valid
    • merge

      default boolean merge(BitMapProducer bitMapProducer)
      Merges the specified BitMap producer into this Bloom filter.

      Specifically: all cells for the indexes identified by the bitMapProducer will be incremented by 1.

      This method will return true if the filter is valid after the operation.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      bitMapProducer - the BitMapProducer
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • merge

      default boolean merge(BloomFilter other)
      Merges the specified Bloom filter into this Bloom filter.

      Specifically: all cells for the indexes identified by the other filter will be incremented by 1.

      Note: If the other filter is a counting Bloom filter the other filter's cells are ignored and it is treated as an IndexProducer.

      This method will return true if the filter is valid after the operation.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      other - the other Bloom filter
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • merge

      default boolean merge(Hasher hasher)
      Merges the specified Hasher into this Bloom filter.

      Specifically: all cells for the unique indexes identified by the hasher will be incremented by 1.

      This method will return true if the filter is valid after the operation.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      hasher - the hasher
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • merge

      default boolean merge(IndexProducer indexProducer)
      Merges the specified index producer into this Bloom filter.

      Specifically: all unique cells for the indices identified by the indexProducer will be incremented by 1.

      This method will return true if the filter is valid after the operation.

      Note: If indices that are returned multiple times should be incremented multiple times convert the IndexProducer to a CellProducer and add that.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      indexProducer - the IndexProducer
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(BitMapProducer bitMapProducer)
      Removes the specified BitMapProducer from this Bloom filter.

      Specifically all cells for the indices produced by the bitMapProducer will be decremented by 1.

      This method will return true if the filter is valid after the operation.

      Parameters:
      bitMapProducer - the BitMapProducer to provide the indexes
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(BloomFilter other)
      Removes the specified Bloom filter from this Bloom filter.

      Specifically: all cells for the indexes identified by the other filter will be decremented by 1.

      Note: If the other filter is a counting Bloom filter the other filter's cells are ignored and it is treated as an IndexProducer.

      This method will return true if the filter is valid after the operation.

      Parameters:
      other - the other Bloom filter
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(Hasher hasher)
      Removes the unique values from the specified hasher from this Bloom filter.

      Specifically all cells for the unique indices produced by the hasher will be decremented by 1.

      This method will return true if the filter is valid after the operation.

      Parameters:
      hasher - the hasher to provide the indexes
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • remove

      default boolean remove(IndexProducer indexProducer)
      Removes the values from the specified IndexProducer from the Bloom filter from this Bloom filter.

      Specifically all cells for the unique indices produced by the hasher will be decremented by 1.

      This method will return true if the filter is valid after the operation.

      Note: If indices that are returned multiple times should be decremented multiple times convert the IndexProducer to a CellProducer and subtract that.

      Parameters:
      indexProducer - the IndexProducer to provide the indexes
      Returns:
      true if the removal was successful and the state is valid
      See Also:
    • subtract

      boolean subtract(CellProducer other)
      Adds the specified CellProducer to this Bloom filter.

      Specifically all cells for the indexes identified by the other will be decremented by their corresponding values in the other.

      This method will return true if the filter is valid after the operation.

      Parameters:
      other - the CellProducer to subtract.
      Returns:
      true if the subtraction was successful and the state is valid
      See Also:
    • uniqueIndices

      Description copied from interface: BloomFilter
      Most Bloom filters create unique IndexProducers.
      Specified by:
      uniqueIndices in interface BloomFilter
      Specified by:
      uniqueIndices in interface CellProducer
      Specified by:
      uniqueIndices in interface IndexProducer
      Returns:
      the IndexProducer of unique values.