org.apache.commons.math3.random

## Class EmpiricalDistribution

• All Implemented Interfaces:
Serializable

```public class EmpiricalDistribution
extends Object
implements Serializable```
Represents an empirical probability distribution -- a probability distribution derived from observed data without making any assumptions about the functional form of the population distribution that the data come from.

An `EmpiricalDistribution` maintains data structures, called distribution digests, that describe empirical distributions and support the following operations:

• dividing the input data into "bin ranges" and reporting bin frequency counts (data for histogram)
• reporting univariate statistics describing the full set of data values as well as the observations within each bin
• generating random values from the distribution
Applications can use `EmpiricalDistribution` to build grouped frequency histograms representing the input data or to generate random values "like" those in the input file -- i.e., the values generated will follow the distribution of the values in the file.

The implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:

Digesting the input file

1. Pass the file once to compute min and max.
2. Divide the range from min-max into `binCount` "bins."
3. Pass the data file again, computing bin counts and univariate statistics (mean, std dev.) for each of the bins
4. Divide the interval (0,1) into subintervals associated with the bins, with the length of a bin's subinterval proportional to its count.
Generating random values from the distribution
1. Generate a uniformly distributed value in (0,1)
2. Select the subinterval to which the value belongs.
3. Generate a random Gaussian value with mean = mean of the associated bin and std dev = std dev of associated bin.

USAGE NOTES:

• The `binCount` is set by default to 1000. A good rule of thumb is to set the bin count to approximately the length of the input file divided by 10.
• The input file must be a plain text file containing one valid numeric entry per line.

Version:
\$Id: EmpiricalDistribution.java 1244107 2012-02-14 16:17:55Z erans \$
Serialized Form
• ### Field Summary

Fields
Modifier and Type Field and Description
`static int` `DEFAULT_BIN_COUNT`
Default bin count
• ### Constructor Summary

Constructors
Constructor and Description
`EmpiricalDistribution()`
Creates a new EmpiricalDistribution with the default bin count.
`EmpiricalDistribution(int binCount)`
Creates a new EmpiricalDistribution with the specified bin count.
```EmpiricalDistribution(int binCount, RandomDataImpl randomData)```
Creates a new EmpiricalDistribution with the specified bin count using the provided `RandomDataImpl` instance as the source of random data.
```EmpiricalDistribution(int binCount, RandomGenerator generator)```
Creates a new EmpiricalDistribution with the specified bin count using the provided `RandomGenerator` as the source of random data.
`EmpiricalDistribution(RandomDataImpl randomData)`
Creates a new EmpiricalDistribution with default bin count using the provided `RandomDataImpl` as the source of random data.
`EmpiricalDistribution(RandomGenerator generator)`
Creates a new EmpiricalDistribution with default bin count using the provided `RandomGenerator` as the source of random data.
• ### Method Summary

Methods
Modifier and Type Method and Description
`int` `getBinCount()`
Returns the number of bins.
`List<SummaryStatistics>` `getBinStats()`
Returns a List of `SummaryStatistics` instances containing statistics describing the values in each of the bins.
`double[]` `getGeneratorUpperBounds()`
Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution.
`double` `getNextValue()`
Generates a random value from this distribution.
`StatisticalSummary` `getSampleStats()`
Returns a `StatisticalSummary` describing this distribution.
`double[]` `getUpperBounds()`
Returns a fresh copy of the array of upper bounds for the bins.
`boolean` `isLoaded()`
Property indicating whether or not the distribution has been loaded.
`void` `load(double[] in)`
Computes the empirical distribution from the provided array of numbers.
`void` `load(File file)`
Computes the empirical distribution from the input file.
`void` `load(URL url)`
Computes the empirical distribution using data read from a URL.
`void` `reSeed(long seed)`
Reseeds the random number generator used by `getNextValue()`.
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Field Detail

• #### DEFAULT_BIN_COUNT

`public static final int DEFAULT_BIN_COUNT`
Default bin count
Constant Field Values
• ### Constructor Detail

• #### EmpiricalDistribution

`public EmpiricalDistribution()`
Creates a new EmpiricalDistribution with the default bin count.
• #### EmpiricalDistribution

`public EmpiricalDistribution(int binCount)`
Creates a new EmpiricalDistribution with the specified bin count.
Parameters:
`binCount` - number of bins
• #### EmpiricalDistribution

```public EmpiricalDistribution(int binCount,
RandomGenerator generator)```
Creates a new EmpiricalDistribution with the specified bin count using the provided `RandomGenerator` as the source of random data.
Parameters:
`binCount` - number of bins
`generator` - random data generator (may be null, resulting in default JDK generator)
Since:
3.0
• #### EmpiricalDistribution

`public EmpiricalDistribution(RandomGenerator generator)`
Creates a new EmpiricalDistribution with default bin count using the provided `RandomGenerator` as the source of random data.
Parameters:
`generator` - random data generator (may be null, resulting in default JDK generator)
Since:
3.0
• #### EmpiricalDistribution

```public EmpiricalDistribution(int binCount,
RandomDataImpl randomData)```
Creates a new EmpiricalDistribution with the specified bin count using the provided `RandomDataImpl` instance as the source of random data.
Parameters:
`binCount` - number of bins
`randomData` - random data generator (may be null, resulting in default JDK generator)
Since:
3.0
• #### EmpiricalDistribution

`public EmpiricalDistribution(RandomDataImpl randomData)`
Creates a new EmpiricalDistribution with default bin count using the provided `RandomDataImpl` as the source of random data.
Parameters:
`randomData` - random data generator (may be null, resulting in default JDK generator)
Since:
3.0
• ### Method Detail

```public void load(double[] in)
throws NullArgumentException```
Computes the empirical distribution from the provided array of numbers.
Parameters:
`in` - the input data array
Throws:
`NullArgumentException` - if in is null

```public void load(URL url)
throws IOException,
NullArgumentException```
Computes the empirical distribution using data read from a URL.
Parameters:
`url` - url of the input file
Throws:
`IOException` - if an IO error occurs
`NullArgumentException` - if url is null

```public void load(File file)
throws IOException,
NullArgumentException```
Computes the empirical distribution from the input file.
Parameters:
`file` - the input file
Throws:
`IOException` - if an IO error occurs
`NullArgumentException` - if file is null
• #### getNextValue

```public double getNextValue()
throws MathIllegalStateException```
Generates a random value from this distribution. Preconditions:
• the distribution must be loaded before invoking this method
Returns:
the random value.
Throws:
`MathIllegalStateException` - if the distribution has not been loaded
• #### getSampleStats

`public StatisticalSummary getSampleStats()`
Returns a `StatisticalSummary` describing this distribution. Preconditions:
• the distribution must be loaded before invoking this method
Returns:
the sample statistics
Throws:
`IllegalStateException` - if the distribution has not been loaded
• #### getBinCount

`public int getBinCount()`
Returns the number of bins.
Returns:
the number of bins.
• #### getBinStats

`public List<SummaryStatistics> getBinStats()`
Returns a List of `SummaryStatistics` instances containing statistics describing the values in each of the bins. The list is indexed on the bin number.
Returns:
List of bin statistics.
• #### getUpperBounds

`public double[] getUpperBounds()`

Returns a fresh copy of the array of upper bounds for the bins. Bins are:
[min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., (upperBounds[binCount-2], upperBounds[binCount-1] = max].

Note: In versions 1.0-2.0 of commons-math, this method incorrectly returned the array of probability generator upper bounds now returned by `getGeneratorUpperBounds()`.

Returns:
array of bin upper bounds
Since:
2.1
• #### getGeneratorUpperBounds

`public double[] getGeneratorUpperBounds()`

Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.

In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned by `getUpperBounds()`.

Returns:
array of upper bounds of subintervals used in data generation
Since:
2.1

`public boolean isLoaded()`
Property indicating whether or not the distribution has been loaded.
Returns:
true if the distribution has been loaded
• #### reSeed

`public void reSeed(long seed)`
Reseeds the random number generator used by `getNextValue()`.
Parameters:
`seed` - random generator seed
Since:
3.0