Class LayeredBloomFilter

java.lang.Object
org.apache.commons.collections4.bloomfilter.LayeredBloomFilter
All Implemented Interfaces:
BitMapProducer, BloomFilter, BloomFilterProducer, IndexProducer

public class LayeredBloomFilter extends Object implements BloomFilter, BloomFilterProducer
Layered Bloom filters are described in Zhiwang, Cen; Jungang, Xu; Jian, Sun (2010), "A multi-layer Bloom filter for duplicated URL detection", Proc. 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010), vol. 1, pp. V1-586-V1-591, doi:10.1109/ICACTE.2010.5578947, ISBN 978-1-4244-6539-2, S2CID 3108985

In short, Layered Bloom filter contains several bloom filters arranged in layers.

  • When membership in the filter is checked each layer in turn is checked and if a match is found true is returned.
  • When merging each bloom filter is merged into the newest filter in the list of layers.
  • When questions of cardinality are asked the cardinality of the union of the enclosed Bloom filters is used.

The net result is that the layered Bloom filter can be populated with more items than the Shape would indicate and yet still return a false positive rate in line with the Shape and not the over population.

This implementation uses a LayerManager to handle the manipulation of the layers.

  • Level 0 is the oldest layer and the highest level is the newest.
  • There is always at least one enclosed filter.
  • The newest filter is the target into which merges are performed.
  • Whenever the target is retrieved, or a merge operation is performed the code checks if any older layers should be removed, and if so removes them. It also checks it a new layer should be added, and if so adds it and sets the target before the operation.
Since:
4.5
  • Constructor Details

    • LayeredBloomFilter

      public LayeredBloomFilter(Shape shape, LayerManager layerManager)
      Constructor.
      Parameters:
      shape - the Shape of the enclosed Bloom filters
      layerManager - the LayerManager to manage the layers.
  • Method Details

    • fixed

      public static LayeredBloomFilter fixed(Shape shape, int maxDepth)
      Creates a fixed size layered bloom filter that adds new filters to the list, but never merges them. List will never exceed maxDepth. As additional filters are added earlier filters are removed.
      Parameters:
      shape - The shape for the enclosed Bloom filters.
      maxDepth - The maximum depth of layers.
      Returns:
      An empty layered Bloom filter of the specified shape and depth.
    • cardinality

      public int cardinality()
      Description copied from interface: BloomFilter
      Gets the cardinality (number of enabled bits) of this Bloom filter.

      This is also known as the Hamming value or Hamming number.

      Specified by:
      cardinality in interface BloomFilter
      Returns:
      the cardinality of this filter
    • characteristics

      public int characteristics()
      Description copied from interface: BloomFilter
      Returns the characteristics of the filter.

      Characteristics are defined as bits within the characteristics integer.

      Specified by:
      characteristics in interface BloomFilter
      Returns:
      the characteristics for this bloom filter.
    • clear

      public final void clear()
      Description copied from interface: BloomFilter
      Resets the filter to its initial, unpopulated state.
      Specified by:
      clear in interface BloomFilter
    • contains

      public boolean contains(BitMapProducer bitMapProducer)
      Description copied from interface: BloomFilter
      Returns true if this filter contains the bits specified in the bit maps produced by the bitMapProducer.
      Specified by:
      contains in interface BloomFilter
      Parameters:
      bitMapProducer - the BitMapProducer to provide the bit maps.
      Returns:
      true if this filter is enabled for all bits specified by the bit maps
    • contains

      public boolean contains(BloomFilter other)
      Returns true if this any layer contained by this filter contains the specified filter.

      If the other is a BloomFilterProducer each filter within the other is checked to see if it exits within this filter.

      Specified by:
      contains in interface BloomFilter
      Parameters:
      other - the other Bloom filter
      Returns:
      true if this filter contains the other filter.
    • contains

      public boolean contains(BloomFilterProducer producer)
      Returns true if each filter within the producer exits within this filter.
      Parameters:
      producer - the BloomFilterProducer that provides the filters to check for.
      Returns:
      true if this filter contains all of the filters contained in the producer.
    • contains

      public boolean contains(Hasher hasher)
      Description copied from interface: BloomFilter
      Returns true if this filter contains the bits specified in the hasher.

      Specifically this returns true if this filter is enabled for all bit indexes identified by the hasher. Using the bit map representations this is effectively (this AND hasher) == hasher.

      Specified by:
      contains in interface BloomFilter
      Parameters:
      hasher - the hasher to provide the indexes
      Returns:
      true if this filter is enabled for all bits specified by the hasher
    • contains

      public boolean contains(IndexProducer indexProducer)
      Description copied from interface: BloomFilter
      Returns true if this filter contains the indices specified IndexProducer.

      Specifically this returns true if this filter is enabled for all bit indexes identified by the IndexProducer.

      Specified by:
      contains in interface BloomFilter
      Parameters:
      indexProducer - the IndexProducer to provide the indexes
      Returns:
      true if this filter is enabled for all bits specified by the IndexProducer
    • copy

      Description copied from interface: BloomFilter
      Creates a new instance of the BloomFilter with the same properties as the current one.
      Specified by:
      copy in interface BloomFilter
      Returns:
      a copy of this BloomFilter
    • estimateN

      public int estimateN()
      Description copied from interface: BloomFilter
      Estimates the number of items in the Bloom filter.

      By default this is the rounding of the Shape.estimateN(cardinality) calculation for the shape and cardinality of this filter.

      This produces an estimate roughly equivalent to the number of Hashers that have been merged into the filter by rounding the value from the calculation described in the Shape class Javadoc.

      Note:

      • if cardinality == numberOfBits, then result is Integer.MAX_VALUE.
      • if cardinality > numberOfBits, then an IllegalArgumentException is thrown.
      Specified by:
      estimateN in interface BloomFilter
      Returns:
      an estimate of the number of items in the bloom filter. Will return Integer.MAX_VALUE if the estimate is larger than Integer.MAX_VALUE.
      See Also:
    • estimateUnion

      public int estimateUnion(BloomFilter other)
      Description copied from interface: BloomFilter
      Estimates the number of items in the union of this Bloom filter with the other bloom filter.

      This produces an estimate roughly equivalent to the number of unique Hashers that have been merged into either of the filters by rounding the value from the calculation described in the Shape class Javadoc.

      estimateUnion should only be called with Bloom filters of the same Shape. If called on Bloom filters of differing shape this method is not symmetric. If other has more bits an IllegalArgumentException may be thrown.

      Specified by:
      estimateUnion in interface BloomFilter
      Parameters:
      other - The other Bloom filter
      Returns:
      an estimate of the number of items in the union. Will return Integer.MAX_VALUE if the estimate is larger than Integer.MAX_VALUE.
      See Also:
    • find

      public int[] find(BitMapProducer bitMapProducer)
      Finds the layers in which the BitMapProducer is found.
      Parameters:
      bitMapProducer - the BitMapProducer to search for.
      Returns:
      an array of layer indices in which the Bloom filter is found.
    • find

      public int[] find(BloomFilter bf)
      Finds the layers in which the Bloom filter is found.
      Parameters:
      bf - the Bloom filter to search for.
      Returns:
      an array of layer indices in which the Bloom filter is found.
    • find

      public int[] find(Hasher hasher)
      Finds the layers in which the Hasher is found.
      Parameters:
      hasher - the Hasher to search for.
      Returns:
      an array of layer indices in which the Bloom filter is found.
    • find

      public int[] find(IndexProducer indexProducer)
      Finds the layers in which the IndexProducer is found.
      Parameters:
      indexProducer - the Index producer to search for.
      Returns:
      an array of layer indices in which the Bloom filter is found.
    • flatten

      public BloomFilter flatten()
      Create a standard (non-layered) Bloom filter by merging all of the layers. If the filter is empty this method will return an empty Bloom filter.
      Specified by:
      flatten in interface BloomFilterProducer
      Returns:
      the merged bloom filter.
    • forEachBitMap

      public boolean forEachBitMap(LongPredicate predicate)
      Description copied from interface: BitMapProducer
      Each bit map is passed to the predicate in order. The predicate is applied to each bit map value, if the predicate returns false the execution is stopped, false is returned, and no further bit maps are processed.

      If the producer is empty this method will return true.

      Any exceptions thrown by the action are relayed to the caller.

      Specified by:
      forEachBitMap in interface BitMapProducer
      Parameters:
      predicate - the function to execute
      Returns:
      true if all bit maps returned true, false otherwise.
    • forEachBloomFilter

      public final boolean forEachBloomFilter(Predicate<BloomFilter> bloomFilterPredicate)
      Processes the Bloom filters in depth order with the most recent filters first. Each filter is passed to the predicate in turn. The function exits on the first false returned by the predicate.
      Specified by:
      forEachBloomFilter in interface BloomFilterProducer
      Parameters:
      bloomFilterPredicate - the predicate to execute.
      Returns:
      true if all filters passed the predicate, false otherwise.
    • forEachIndex

      public boolean forEachIndex(IntPredicate predicate)
      Description copied from interface: IndexProducer
      Each index is passed to the predicate. The predicate is applied to each index value, if the predicate returns false the execution is stopped, false is returned, and no further indices are processed.

      Any exceptions thrown by the action are relayed to the caller.

      Indices ordering and uniqueness is not guaranteed.

      Specified by:
      forEachIndex in interface IndexProducer
      Parameters:
      predicate - the action to be performed for each non-zero bit index.
      Returns:
      true if all indexes return true from consumer, false otherwise.
    • get

      public BloomFilter get(int depth)
      Gets the Bloom filter at the specified depth
      Parameters:
      depth - the depth of the filter to return.
      Returns:
      the Bloom filter at the specified depth.
      Throws:
      NoSuchElementException - if depth is not in the range [0,getDepth())
    • getDepth

      public final int getDepth()
      Gets the depth of the deepest layer. The minimum value returned by this method is 1.
      Returns:
      the depth of the deepest layer.
    • getShape

      public final Shape getShape()
      Description copied from interface: BloomFilter
      Gets the shape that was used when the filter was built.
      Specified by:
      getShape in interface BloomFilter
      Returns:
      The shape the filter was built with.
    • isEmpty

      public boolean isEmpty()
      Description copied from interface: BloomFilter
      Determines if all the bits are off. This is equivalent to cardinality() == 0.

      Note: This method is optimised for non-sparse filters. Implementers are encouraged to implement faster checks if possible.

      Specified by:
      isEmpty in interface BloomFilter
      Returns:
      true if no bits are enabled, false otherwise.
    • merge

      public boolean merge(BitMapProducer bitMapProducer)
      Description copied from interface: BloomFilter
      Merges the specified hasher into this Bloom filter. Specifically all bit indexes that are identified by the producer will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain all the indexes enabled in the producer. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      bitMapProducer - The producer to merge.
      Returns:
      true if the merge was successful
    • merge

      public boolean merge(BloomFilter bf)
      Description copied from interface: BloomFilter
      Merges the specified Bloom filter into this Bloom filter.

      Specifically all bit indexes that are identified by the other will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain the other Bloom filter. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      bf - The bloom filter to merge into this one.
      Returns:
      true if the merge was successful
    • merge

      public boolean merge(IndexProducer indexProducer)
      Description copied from interface: BloomFilter
      Merges the specified IndexProducer into this Bloom filter. Specifically all bit indexes that are identified by the producer will be enabled in this filter.

      Note: This method should return true even if no additional bit indexes were enabled. A false result indicates that this filter may or may not contain all the indexes of the producer. This state may occur in complex Bloom filter implementations like counting Bloom filters.

      Specified by:
      merge in interface BloomFilter
      Parameters:
      indexProducer - The IndexProducer to merge.
      Returns:
      true if the merge was successful
    • next

      public void next()
      Forces and advance to the next layer. Executes the same logic as when LayerManager.extendCheck returns true
      See Also: