org.apache.commons.math3.distribution

## Class ZipfDistribution

• All Implemented Interfaces:
Serializable, IntegerDistribution

public class ZipfDistribution
extends AbstractIntegerDistribution
Implementation of the Zipf distribution.

Parameters: For a random variable X whose values are distributed according to this distribution, the probability mass function is given by

   P(X = k) = H(N,s) * 1 / k^s    for k = 1,2,...,N.

H(N,s) is the normalizing constant which corresponds to the generalized harmonic number of order N of s.

• N is the number of elements
• s is the exponent
Zipf's law (Wikipedia), Generalized harmonic numbers, Serialized Form

• ### Fields inherited from class org.apache.commons.math3.distribution.AbstractIntegerDistribution

random, randomData
• ### Constructor Summary

Constructors
Constructor and Description
ZipfDistribution(int numberOfElements, double exponent)
Create a new Zipf distribution with the given number of elements and exponent.
ZipfDistribution(RandomGenerator rng, int numberOfElements, double exponent)
Creates a Zipf distribution.
• ### Method Summary

Methods
Modifier and Type Method and Description
protected double calculateNumericalMean()
protected double calculateNumericalVariance()
double cumulativeProbability(int x)
For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x).
double getExponent()
Get the exponent characterizing the distribution.
int getNumberOfElements()
Get the number of elements (e.g.
double getNumericalMean()
Use this method to get the numerical value of the mean of this distribution.
double getNumericalVariance()
Use this method to get the numerical value of the variance of this distribution.
int getSupportLowerBound()
Access the lower bound of the support.
int getSupportUpperBound()
Access the upper bound of the support.
boolean isSupportConnected()
Use this method to get information about whether the support is connected, i.e.
double logProbability(int x)
For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
double probability(int x)
For a random variable X whose values are distributed according to this distribution, this method returns P(X = x).
int sample()
Generate a random value sampled from this distribution.
• ### Methods inherited from class org.apache.commons.math3.distribution.AbstractIntegerDistribution

cumulativeProbability, inverseCumulativeProbability, reseedRandomGenerator, sample, solveInverseCumulativeProbability
• ### Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• ### Constructor Detail

• #### ZipfDistribution

public ZipfDistribution(int numberOfElements,
double exponent)
Create a new Zipf distribution with the given number of elements and exponent.

Note: this constructor will implicitly create an instance of Well19937c as random generator to be used for sampling only (see sample() and AbstractIntegerDistribution.sample(int)). In case no sampling is needed for the created distribution, it is advised to pass null as random generator via the appropriate constructors to avoid the additional initialisation overhead.

Parameters:
numberOfElements - Number of elements.
exponent - Exponent.
Throws:
NotStrictlyPositiveException - if numberOfElements <= 0 or exponent <= 0.
• #### ZipfDistribution

public ZipfDistribution(RandomGenerator rng,
int numberOfElements,
double exponent)
throws NotStrictlyPositiveException
Creates a Zipf distribution.
Parameters:
rng - Random number generator.
numberOfElements - Number of elements.
exponent - Exponent.
Throws:
NotStrictlyPositiveException - if numberOfElements <= 0 or exponent <= 0.
Since:
3.1
• ### Method Detail

• #### getNumberOfElements

public int getNumberOfElements()
Get the number of elements (e.g. corpus size) for the distribution.
Returns:
the number of elements
• #### getExponent

public double getExponent()
Get the exponent characterizing the distribution.
Returns:
the exponent
• #### probability

public double probability(int x)
For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
Parameters:
x - the point at which the PMF is evaluated
Returns:
the value of the probability mass function at x
• #### logProbability

public double logProbability(int x)
For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm. In other words, this method represents the logarithm of the probability mass function (PMF) for the distribution. Note that due to the floating point precision and under/overflow issues, this method will for some distributions be more precise and faster than computing the logarithm of IntegerDistribution.probability(int).

The default implementation simply computes the logarithm of probability(x).

Overrides:
logProbability in class AbstractIntegerDistribution
Parameters:
x - the point at which the PMF is evaluated
Returns:
the logarithm of the value of the probability mass function at x
• #### cumulativeProbability

public double cumulativeProbability(int x)
For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other words, this method represents the (cumulative) distribution function (CDF) for this distribution.
Parameters:
x - the point at which the CDF is evaluated
Returns:
the probability that a random variable with this distribution takes a value less than or equal to x
• #### getNumericalMean

public double getNumericalMean()
Use this method to get the numerical value of the mean of this distribution. For number of elements N and exponent s, the mean is Hs1 / Hs, where
• Hs1 = generalizedHarmonic(N, s - 1),
• Hs = generalizedHarmonic(N, s).
Returns:
the mean or Double.NaN if it is not defined
• #### calculateNumericalMean

protected double calculateNumericalMean()
Returns:
the mean of this distribution
• #### getNumericalVariance

public double getNumericalVariance()
Use this method to get the numerical value of the variance of this distribution. For number of elements N and exponent s, the mean is (Hs2 / Hs) - (Hs1^2 / Hs^2), where
• Hs2 = generalizedHarmonic(N, s - 2),
• Hs1 = generalizedHarmonic(N, s - 1),
• Hs = generalizedHarmonic(N, s).
Returns:
the variance (possibly Double.POSITIVE_INFINITY or Double.NaN if it is not defined)
• #### calculateNumericalVariance

protected double calculateNumericalVariance()
Returns:
the variance of this distribution
• #### getSupportLowerBound

public int getSupportLowerBound()
Access the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0). In other words, this method must return

inf {x in Z | P(X <= x) > 0}.

The lower bound of the support is always 1 no matter the parameters.
Returns:
lower bound of the support (always 1)
• #### getSupportUpperBound

public int getSupportUpperBound()
Access the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1). In other words, this method must return

inf {x in R | P(X <= x) = 1}.

The upper bound of the support is the number of elements.
Returns:
upper bound of the support
• #### isSupportConnected

public boolean isSupportConnected()
Use this method to get information about whether the support is connected, i.e. whether all integers between the lower and upper bound of the support are included in the support. The support of this distribution is connected.
Returns:
true
• #### sample

public int sample()
Generate a random value sampled from this distribution. The default implementation uses the inversion method.
Specified by:
sample in interface IntegerDistribution
Overrides:
sample in class AbstractIntegerDistribution
Returns:
a random value