Class AliasMethodDiscreteSampler

    • Field Detail

      • probability

        protected final long[] probability
        The probability table. During sampling a random index into this table is selected. A random probability is compared to the value at this index: if lower then the sample is the index; if higher then the sample uses the corresponding entry in the alias table.

        This has entries up to the last non-zero element since there is no need to store probabilities of zero. This is an optimisation for zero-padded input. Any zero value will always be aliased so any look-up index outside this table always uses the alias.

        Note that a uniform double in the range [0,1) can be generated using 53-bits from a long to sample all the dyadic rationals with a denominator of 253 (e.g. see org.apache.commons.rng.core.utils.NumberFactory.makeDouble(long)). To avoid computation of a double and comparison to the probability as a double the probabilities are stored as 53-bit longs to use integer arithmetic. This is the equivalent of storing the numerator of a fraction with the denominator of 253.

        During conversion of the probability to a double it is rounded up to the next integer value. This ensures the functionality of comparing a uniform deviate distributed evenly on the interval 1/2^53 to the unevenly distributed probability is equivalent, i.e. a uniform deviate is either below the probability or above it:

         Uniform deviate
          1/2^53    2/2^53    3/2^53    4/2^53
         --|---------|---------|---------|---
              ^
              |
          probability
                     ^
                     |
                 rounded up
         

        Round-up ensures a non-zero probability is always non-zero and zero probability remains zero. Thus any item with a non-zero input probability can always be sampled, and a zero input probability cannot be sampled.

        See Also:
        Dyadic rational
      • alias

        protected final int[] alias
        The alias table. During sampling if the random probability is not below the entry in the probability table then the sample is the alias.
    • Method Detail

      • of

        public static SharedStateDiscreteSampler of​(UniformRandomProvider rng,
                                                    double[] probabilities)
        Creates a sampler.

        The probabilities will be normalised using their sum. The only requirement is the sum is strictly positive.

        Where possible this method zero-pads the probabilities so the length is the next power-of-two. Padding is bounded by the upper limit on the size of an array.

        To avoid zero-padding use the of(UniformRandomProvider, double[], int) method with a negative alpha factor.

        Parameters:
        rng - Generator of uniformly distributed random numbers.
        probabilities - The list of probabilities.
        Returns:
        the sampler
        Throws:
        IllegalArgumentException - if probabilities is null or empty, a probability is negative, infinite or NaN, or the sum of all probabilities is not strictly positive.
        See Also:
        of(UniformRandomProvider, double[], int)
      • of

        public static SharedStateDiscreteSampler of​(UniformRandomProvider rng,
                                                    double[] probabilities,
                                                    int alpha)
        Creates a sampler.

        The probabilities will be normalised using their sum. The only requirement is the sum is strictly positive.

        Where possible this method zero-pads the probabilities to improve sampling efficiency. Padding is bounded by the upper limit on the size of an array and controlled by the alpha argument. Set to negative to disable padding.

        For each zero padded value an entry is added to the tables which is always aliased. This can be sampled with fewer bits required from the UniformRandomProvider. Increasing the padding of zeros increases the chance of using this fast path to selecting a sample. The penalty is two-fold: initialisation is bounded by O(n) time with n the size after padding; an additional memory cost of 4 bytes per padded value.

        Zero padding to any length improves performance; using a power of 2 allows the index into the tables to be more efficiently generated. The argument alpha controls the level of padding. Positive values of alpha represent a scale factor in powers of 2. The size of the input array will be increased by a factor of 2alpha and then rounded-up to the next power of 2. Padding is bounded by the upper limit on the size of an array.

        The chance of executing the slow path is upper bounded at 2-alpha when padding is enabled. Each successive doubling of padding will have diminishing performance gains.

        Parameters:
        rng - Generator of uniformly distributed random numbers.
        probabilities - The list of probabilities.
        alpha - The alpha factor controlling the zero padding.
        Returns:
        the sampler
        Throws:
        IllegalArgumentException - if probabilities is null or empty, a probability is negative, infinite or NaN, or the sum of all probabilities is not strictly positive.