Class EnhancedDoubleHasher
- All Implemented Interfaces:
Hasher
Common use for this hasher is to generate bit indices from a byte array output of a hashing or MessageDigest algorithm.
Thoughts on the hasher input
Note that it is worse to create smaller numbers for the initial
and increment
. If the initial
is smaller than
the number of bits in a filter then hashing will start at the same point when the size increases; likewise the increment
will be
the same if it remains smaller than the number of bits in the filter and so the first few indices will be the same if the number of bits
changes (but is still larger than the increment
). In a worse case scenario with small initial
and increment
for
all items, hashing may not create indices that fill the full region within a much larger filter. Imagine hashers created with initial
and increment
values less than 255 with a filter size of 30000 and number of hash functions as 5. Ignoring the
tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 255 = 1275, this covers 4.25% of the filter. This also
ignores the negative wrapping but the behavior is the same, some bits cannot be reached.
So this needs to be avoided as the filter probability assumptions will be void. If the initial
and increment
are larger
than the number of bits then the modulus will create a 'random' position and increment within the size.
- Since:
- 4.5
-
Constructor Summary
ConstructorDescriptionEnhancedDoubleHasher
(byte[] buffer) Constructs the EnhancedDoubleHasher from a byte array.EnhancedDoubleHasher
(long initial, long increment) Constructs the EnhancedDoubleHasher from 2 longs. -
Method Summary
Modifier and TypeMethodDescriptionCreates an IndexExtractor for this hasher based on the Shape.
-
Constructor Details
-
EnhancedDoubleHasher
Constructs the EnhancedDoubleHasher from a byte array.This method simplifies the conversion from a Digest or hasher algorithm output to the two values used by the EnhancedDoubleHasher.
The byte array is split in 2 and the first 8 bytes of each half are interpreted as a big-endian long value. Excess bytes are ignored. If there are fewer than 16 bytes the following conversions are made.
- If there is an odd number of bytes the excess byte is assigned to the increment value
- The bytes alloted are read in big-endian order any byte not populated is set to zero.
This ensures that small arrays generate the largest possible increment and initial values.
- Parameters:
buffer
- the buffer to extract the longs from.- Throws:
IllegalArgumentException
- is buffer length is zero.
-
EnhancedDoubleHasher
Constructs the EnhancedDoubleHasher from 2 longs. The long values will be interpreted as unsigned values.- Parameters:
initial
- The initial value for the hasher.increment
- The value to increment the hash by on each iteration.
-
-
Method Details
-
indices
Description copied from interface:Hasher
Creates an IndexExtractor for this hasher based on the Shape.The
IndexExtractor
will create indices within the range defined by the number of bits in the shape. The total number of indices will respect the number of hash functions per item defined by the shape. However the count of indices may not be a multiple of the number of hash functions if the implementation has removed duplicates.This IndexExtractor must be deterministic in that it must return the same indices for the same Shape.
No guarantee is made as to order of indices.
Duplicates indices for a single item may be produced.
-