Class FastLoadedDiceRollerDiscreteSampler

  • All Implemented Interfaces:
    DiscreteSampler, SharedStateDiscreteSampler, SharedStateSampler<SharedStateDiscreteSampler>

    public abstract class FastLoadedDiceRollerDiscreteSampler
    extends Object
    implements SharedStateDiscreteSampler
    Distribution sampler that uses the Fast Loaded Dice Roller (FLDR). It can be used to sample from n values each with an associated relative weight. If all unique items are assigned the same weight it is more efficient to use the DiscreteUniformSampler.

    Given a list L of n positive numbers, where L[i] represents the relative weight of the ith side, FLDR returns integer i with relative probability L[i].

    FLDR produces exact samples from the specified probability distribution.

    • For integer weights, the probability of returning i is precisely equal to the rational number L[i] / m, where m is the sum of L.
    • For floating-points weights, each weight L[i] is converted to the corresponding rational number p[i] / q[i] where p[i] is a positive integer and q[i] is a power of 2. The rational weights are then normalized (exactly) to sum to unity.

    Note that if exact samples are not required then an alternative sampler that ignores very small relative weights may have improved sampling performance.

    This implementation is based on the algorithm in:

    Feras A. Saad, Cameron E. Freer, Martin C. Rinard, and Vikash K. Mansinghka. The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions. In AISTATS 2020: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research 108, Palermo, Sicily, Italy, 2020.

    Sampling uses UniformRandomProvider.nextInt() as the source of random bits.

    Since:
    1.5
    See Also:
    Saad et al (2020) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, PMLR 108:1036-1046.
    • Method Detail

      • of

        public static FastLoadedDiceRollerDiscreteSampler of​(UniformRandomProvider rng,
                                                             long[] frequencies)
        Creates a sampler.

        Note: The discrete distribution generating (DDG) tree requires (n + 1) * k entries where n is the number of categories, k == ceil(log2(m)) and m is the sum of the observed frequencies. An exception is raised if this cannot be allocated as a single array.

        For reference the sum is limited to Long.MAX_VALUE and the value k to 63. The number of categories is limited to approximately ((2^31 - 1) / k) = 34,087,042 when the sum of frequencies is large enough to create k=63.

        Parameters:
        rng - Generator of uniformly distributed random numbers.
        frequencies - Observed frequencies of the discrete distribution.
        Returns:
        the sampler
        Throws:
        IllegalArgumentException - if frequencies is null or empty, a frequency is negative, the sum of all frequencies is either zero or above Long.MAX_VALUE, or the size of the discrete distribution generating tree is too large.
      • of

        public static FastLoadedDiceRollerDiscreteSampler of​(UniformRandomProvider rng,
                                                             double[] weights)
        Creates a sampler.

        Weights are converted to rational numbers p / q where q is a power of 2. The numerators p are scaled to use a common denominator before summing.

        All weights are used to create the sampler. Weights with a small magnitude relative to the largest weight can be excluded using the constructor method with the relative magnitude parameter alpha (see of(UniformRandomProvider, double[], int)).

        Parameters:
        rng - Generator of uniformly distributed random numbers.
        weights - Weights of the discrete distribution.
        Returns:
        the sampler
        Throws:
        IllegalArgumentException - if weights is null or empty, a weight is negative, infinite or NaN, the sum of all weights is zero, or the size of the discrete distribution generating tree is too large.
        See Also:
        of(UniformRandomProvider, double[], int)
      • of

        public static FastLoadedDiceRollerDiscreteSampler of​(UniformRandomProvider rng,
                                                             double[] weights,
                                                             int alpha)
        Creates a sampler.

        Weights are converted to rational numbers p / q where q is a power of 2. The numerators p are scaled to use a common denominator before summing.

        Note: The discrete distribution generating (DDG) tree requires (n + 1) * k entries where n is the number of categories, k == ceil(log2(m)) and m is the sum of the weight numerators q. An exception is raised if this cannot be allocated as a single array.

        For reference the value k is equal to or greater than the ratio of the largest to the smallest weight expressed as a power of 2. For Double.MAX_VALUE / Double.MIN_VALUE this is ~2098. The value k increases with the sum of the weight numerators. A number of weights in excess of 1,000,000 with values equal to Double.MAX_VALUE would be required to raise an exception when the minimum weight is Double.MIN_VALUE.

        Weights with a small magnitude relative to the largest weight can be excluded using the relative magnitude parameter alpha. This will set any weight to zero if the magnitude is approximately 2alpha smaller than the largest weight. This comparison is made using only the exponent of the input weights. The alpha parameter is ignored if not above zero. Note that a small alpha parameter will exclude more weights than a large alpha parameter.

        The alpha parameter can be used to exclude categories that have a very low probability of occurrence and will improve the construction performance of the sampler. The effect on sampling performance depends on the relative weights of the excluded categories; typically a high alpha is used to exclude categories that would be visited with a very low probability and the sampling performance is unchanged.

        Implementation Note

        This method creates a sampler with exact samples from the specified probability distribution. It is recommended to use this method:

        • if the weights are computed, for example from a probability mass function; or
        • if the weights sum to an infinite value.

        If the weights are computed from empirical observations then it is recommended to use the factory method accepting frequencies. This requires the total number of observations to be representable as a long integer.

        Note that if all weights are scaled by a power of 2 to be integers, and each integer can be represented as a positive 64-bit long value, then the sampler created using this method will match the output from a sampler created with the scaled weights converted to long values for the factory method accepting frequencies. This assumes the sum of the integer values does not overflow.

        It should be noted that the conversion of weights to rational numbers has a performance overhead during construction (sampling performance is not affected). This may be avoided by first converting them to integer values that can be summed without overflow. For example by scaling values by 2^62 / sum and converting to long by casting or rounding.

        This approach may increase the efficiency of construction. The resulting sampler may no longer produce exact samples from the distribution. In particular any weights with a converted frequency of zero cannot be sampled.

        Parameters:
        rng - Generator of uniformly distributed random numbers.
        weights - Weights of the discrete distribution.
        alpha - Alpha parameter.
        Returns:
        the sampler
        Throws:
        IllegalArgumentException - if weights is null or empty, a weight is negative, infinite or NaN, the sum of all weights is zero, or the size of the discrete distribution generating tree is too large.
        See Also:
        of(UniformRandomProvider, long[])