DiscreteSampler
, SharedStateDiscreteSampler
, SharedStateSampler<SharedStateDiscreteSampler>
public class AliasMethodDiscreteSampler extends java.lang.Object implements SharedStateDiscreteSampler
n
values each with an associated probability. If all unique items
are assigned the same probability it is more efficient to use the DiscreteUniformSampler
.
This implementation is based on the detailed explanation of the alias method by Keith Schartz and implements Vose's algorithm.
Vose, M.D., A linear algorithm for generating random numbers with a given distribution, IEEE Transactions on Software Engineering, 17, 972-975, 1991.
The algorithm will sample values in O(1)
time after a pre-processing step of
O(n)
time.
The alias tables are constructed using fraction probabilities with an assumed denominator
of 253. In the generic case sampling uses UniformRandomProvider.nextInt(int)
and the upper 53-bits from UniformRandomProvider.nextLong()
.
Zero padding the input probabilities can be used to make more sampling more efficient.
Any zero entry will always be aliased removing the requirement to compute a long
.
Increased sampling speed comes at the cost of increased storage space. The algorithm requires
approximately 12 bytes of storage per input probability, that is n * 12
for size
n
. Zero-padding only requires 4 bytes of storage per padded value as the probability is
known to be zero. A table can be padded to a power of 2 using the utility function
of(UniformRandomProvider, double[], int)
to construct the sampler.
An optimisation is performed for small table sizes that are a power of 2. In this case the
sampling uses 1 or 2 calls from UniformRandomProvider.nextInt()
to generate up to
64-bits for creation of an 11-bit index and 53-bits for the long
. This optimisation
requires a generator with a high cycle length for the lower order bits.
Larger table sizes that are a power of 2 will benefit from fast algorithms for
UniformRandomProvider.nextInt(int)
that exploit the power of 2.
Modifier and Type | Field | Description |
---|---|---|
protected int[] |
alias |
The alias table.
|
protected long[] |
probability |
The probability table.
|
protected org.apache.commons.rng.UniformRandomProvider |
rng |
Underlying source of randomness.
|
Modifier and Type | Method | Description |
---|---|---|
static SharedStateDiscreteSampler |
of(org.apache.commons.rng.UniformRandomProvider rng,
double[] probabilities) |
Creates a sampler.
|
static SharedStateDiscreteSampler |
of(org.apache.commons.rng.UniformRandomProvider rng,
double[] probabilities,
int alpha) |
Creates a sampler.
|
int |
sample() |
Creates a sample.
|
java.lang.String |
toString() |
|
SharedStateDiscreteSampler |
withUniformRandomProvider(org.apache.commons.rng.UniformRandomProvider rng) |
Create a new instance of the sampler with the same underlying state using the given
uniform random provider as the source of randomness.
|
protected final org.apache.commons.rng.UniformRandomProvider rng
protected final long[] probability
This has entries up to the last non-zero element since there is no need to store probabilities of zero. This is an optimisation for zero-padded input. Any zero value will always be aliased so any look-up index outside this table always uses the alias.
Note that a uniform double in the range [0,1) can be generated using 53-bits from a long to sample all the dyadic rationals with a denominator of 253 (e.g. see org.apache.commons.rng.core.utils.NumberFactory.makeDouble(long)). To avoid computation of a double and comparison to the probability as a double the probabilities are stored as 53-bit longs to use integer arithmetic. This is the equivalent of storing the numerator of a fraction with the denominator of 253.
During conversion of the probability to a double it is rounded up to the next integer value. This ensures the functionality of comparing a uniform deviate distributed evenly on the interval 1/2^53 to the unevenly distributed probability is equivalent, i.e. a uniform deviate is either below the probability or above it:
Uniform deviate 1/2^53 2/2^53 3/2^53 4/2^53 --|---------|---------|---------|--- ^ | probability ^ | rounded up
Round-up ensures a non-zero probability is always non-zero and zero probability remains zero. Thus any item with a non-zero input probability can always be sampled, and a zero input probability cannot be sampled.
protected final int[] alias
public int sample()
sample
in interface DiscreteSampler
public java.lang.String toString()
toString
in class java.lang.Object
public SharedStateDiscreteSampler withUniformRandomProvider(org.apache.commons.rng.UniformRandomProvider rng)
withUniformRandomProvider
in interface SharedStateSampler<SharedStateDiscreteSampler>
rng
- Generator of uniformly distributed random numbers.public static SharedStateDiscreteSampler of(org.apache.commons.rng.UniformRandomProvider rng, double[] probabilities)
The probabilities will be normalised using their sum. The only requirement is the sum is strictly positive.
Where possible this method zero-pads the probabilities so the length is the next power-of-two. Padding is bounded by the upper limit on the size of an array.
To avoid zero-padding use the
of(UniformRandomProvider, double[], int)
method with a negative
alpha
factor.
rng
- Generator of uniformly distributed random numbers.probabilities
- The list of probabilities.java.lang.IllegalArgumentException
- if probabilities
is null or empty, a
probability is negative, infinite or NaN
, or the sum of all
probabilities is not strictly positive.of(UniformRandomProvider, double[], int)
public static SharedStateDiscreteSampler of(org.apache.commons.rng.UniformRandomProvider rng, double[] probabilities, int alpha)
The probabilities will be normalised using their sum. The only requirement is the sum is strictly positive.
Where possible this method zero-pads the probabilities to improve sampling
efficiency. Padding is bounded by the upper limit on the size of an array and
controlled by the alpha
argument. Set to negative to disable
padding.
For each zero padded value an entry is added to the tables which is always
aliased. This can be sampled with fewer bits required from the
UniformRandomProvider
. Increasing the padding of zeros increases the
chance of using this fast path to selecting a sample. The penalty is
two-fold: initialisation is bounded by O(n)
time with n
the
size after padding; an additional memory cost of 4 bytes per
padded value.
Zero padding to any length improves performance; using a power of 2 allows
the index into the tables to be more efficiently generated. The argument
alpha
controls the level of padding. Positive values of alpha
represent a scale factor in powers of 2. The size of the input array will be
increased by a factor of 2alpha and then rounded-up to the next
power of 2. Padding is bounded by the upper limit on the size of an
array.
The chance of executing the slow path is upper bounded at 2-alpha when padding is enabled. Each successive doubling of padding will have diminishing performance gains.
rng
- Generator of uniformly distributed random numbers.probabilities
- The list of probabilities.alpha
- The alpha factor controlling the zero padding.java.lang.IllegalArgumentException
- if probabilities
is null or empty, a
probability is negative, infinite or NaN
, or the sum of all
probabilities is not strictly positive.Copyright © 2016–2019 The Apache Software Foundation. All rights reserved.