Apache Commons Statistics User Guide
Overview
Apache Commons Statistics provides utilities for statistical applications. The code
originated in the
commons-math project but was pulled out into a separate project for better
maintainability and has since undergone numerous improvements.
Commons Statistics is divided into a number of submodules:
Example Modules
In addition to the modules above, the Commons Statistics
source distribution
contains example code demonstrating library functionality and/or providing useful
development utilities. These modules are not part of the public API of the library and no
guarantees are made concerning backwards compatibility. The
example module parent page
contains a listing of available modules.
Descriptive Statistics
The commons-statistics-descriptive module provides descriptive statistics.
Overview
The module provides classes to compute univariate statistics on double ,
int and long data using array input or a Java stream. The
result is returned as a
StatisticResult.
The StatisticResult provides methods to supply the result as a
double , int , long and BigInteger .
The integer types allow the exact result to be returned for integer data. For example
the sum of long values may not be exactly representable as a
double and may overflow a long .
Computation of an individual statistic involves creating an instance of
StatisticResult that can supply the current statistic value.
To allow addition of single values to update the statistic, instances
implement the primitive consumer interface for the supported type:
DoubleConsumer , IntConsumer , or LongConsumer .
Instances implement the
StatisticAccumulator
interface and can be combined with other instances. This allows computation in parallel on
subsets of data and combination to a final result. This can be performed using the
Java stream API.
Computation of multiple statistics uses a
Statistic
enumeration to define the statistics to evaluate. A container class is created to
compute the desired statistics together and allows multiple statistics to be computed
concurrently using the Java stream API. Each statistic result is obtained using the
Statistic enum to access the required value. Providing a choice of the
statistics allows the user to avoid the computational cost of results that are not
required.
Note that double computations are subject to accumulated floating-point
rounding which can generate different results from permuted input data. Computation
on an array of double data can use a multiple-pass algorithm to increase
accuracy over a single-pass stream of values. This is the recommended approach if
all data is already stored in an array (i.e. is not dynamically generated).
If the data is an integer type then it is
preferred to use the integer specializations of the statistics.
Many implementations use exact integer math for the computation. This is faster than
using a double data type, more accurate and returns the same result
irrespective of the input order of the data. Note that for improved performance there
is no use of BigInteger in the accumulation of intermediate values; the
computation uses mutable fixed-precision integer classes for totals that may
overflow 64-bits.
Some statistics cannot be computed using a stream since they require all values for
computation, for example the median. These are evaluated on an array using an instance
of a computing class. The instance allows computation options to be changed. Instances
are immutable and the computation is thread-safe.
Examples
Computation of a single statistic from an array of values, an array range or a
stream of data:
int[] values = {1, 1, 2, 3, 5, 8, 13, 21};
double v = IntVariance.of(values).getAsDouble();
// Range uses inclusive start and exclusive end
int max = IntMax.ofRange(values, 3, 6).getAsInt(); // 8
double m = Stream.of("one", "two", "three", "four")
.mapToInt(String::length)
.collect(IntMean::create, IntMean::accept, IntMean::combine)
.getAsDouble();
Computation of multiple statistics uses the Statistic enum.
These can be specified using an EnumSet together with the input array data.
Note that some statistics share the same underlying computation, for example the variance,
standard deviation and mean. When a container class is constructed using one of the
statistics, the other co-computed statistics are available in the result even if not
specified during construction. The isSupported method can
identify all results that are available from the container class.
double[] data = {1, 2, 3, 4, 5, 6, 7, 8};
DoubleStatistics stats = DoubleStatistics.of(
EnumSet.of(Statistic.MIN, Statistic.MAX, Statistic.VARIANCE),
data);
stats.getAsDouble(Statistic.MIN); // 1.0
stats.getAsDouble(Statistic.MAX); // 8.0
stats.getAsDouble(Statistic.VARIANCE); // 6.0
// Get other statistics supported by the underlying computations
stats.isSupported(Statistic.STANDARD_DEVIATION)); // true
stats.getAsDouble(Statistic.STANDARD_DEVIATION); // 2.449...
Computation of multiple statistics on individual values can accumulate the results
using the accept method of the container class:
IntStatistics stats = IntStatistics.of(
Statistic.MIN, Statistic.MAX, Statistic.MEAN);
Stream.of("one", "two", "three", "four")
.mapToInt(String::length)
.forEach(stats::accept);
stats.getAsInt(Statistic.MIN); // 3
stats.getAsInt(Statistic.MAX); // 5
stats.getAsDouble(Statistic.MEAN); // 15.0 / 4
Computation of multiple statistics on a stream of values in parallel.
This requires use of a Builder that
can supply instances of the container class to each worker with the
build method. These are populated using accept and then
collected using combine :
IntStatistics.Builder builder = IntStatistics.builder(
Statistic.MIN, Statistic.MAX, Statistic.MEAN);
IntStatistics stats =
Stream.of("one", "two", "three", "four")
.parallel()
.mapToInt(String::length)
.collect(builder::build, IntConsumer::accept, IntStatistics::combine);
stats.getAsInt(Statistic.MIN); // 3
stats.getAsInt(Statistic.MAX); // 5
stats.getAsDouble(Statistic.MEAN); // 15.0 / 4
Computation on multiple arrays. This requires use of a Builder that
can supply instances of the container class to compute each array with the
build method:
double[][] data = {
{1, 2, 3, 4},
{5, 6, 7, 8},
};
DoubleStatistics.Builder builder = DoubleStatistics.builder(
Statistic.MIN, Statistic.MAX, Statistic.VARIANCE);
DoubleStatistics stats = Arrays.stream(data)
.map(builder::build)
.reduce(DoubleStatistics::combine)
.get();
stats.getAsDouble(Statistic.MIN); // 1.0
stats.getAsDouble(Statistic.MAX); // 8.0
stats.getAsDouble(Statistic.VARIANCE); // 6.0
// Get other statistics supported by the underlying computations
stats.isSupported(Statistic.MEAN)); // true
stats.getAsDouble(Statistic.MEAN); // 4.5
If computation on multiple arrays is to be repeated then this can be done with a
re-useable java.util.stream.Collector :
double[][] data = {
{1, 2, 3, 4},
{5, 6, 7, 8},
};
DoubleStatistics.Builder builder = DoubleStatistics.builder(
Statistic.MIN, Statistic.MAX, Statistic.VARIANCE);
Collector<double[], DoubleStatistics, DoubleStatistics> collector =
Collector.of(builder::build, (s, d) -> s.combine(builder.build(d)), DoubleStatistics::combine);
DoubleStatistics stats = Arrays.stream(data).collect(collector);
stats.getAsDouble(Statistic.MIN); // 1.0
stats.getAsDouble(Statistic.MAX); // 8.0
stats.getAsDouble(Statistic.VARIANCE); // 6.0
Combination of multiple statistics requires them to be compatible, i.e. all supported
statistics in one container are also supported in the other. Note that combining another
container ignores any unsupported statistics and the compatibility
may be asymmetric.
double[] data1 = {1, 2, 3, 4};
double[] data2 = {5, 6, 7, 8};
DoubleStatistics varStats = DoubleStatistics.builder(Statistic.VARIANCE).build(data1);
DoubleStatistics meanStats = DoubleStatistics.builder(Statistic.MEAN).build(data2);
// throws IllegalArgumentException
varStats.combine(meanStats);
// OK - mean is updated to 4.5
meanStats.combine(varStats)
Computation of a statistic that requires all data (i.e. does not support the
Stream API) uses a configurable instance of the computing class:
double[] data = {8, 7, 6, 5, 4, 3, 2, 1};
// Configure the statistic
double m = Median.withDefaults()
.withCopy(true) // do not modify the input array
.with(NaNPolicy.ERROR) // raise an exception for NaN
.evaluate(data);
// m = 4.5
Computation of multiple values of a statistic that requires all data:
int size = 10000;
double origin = 0;
double bound = 100;
double[] data =
new SplittableRandom(123)
.doubles(size, origin, bound)
.toArray();
// Evaluate multiple statistics on the same data
double[] q = Quantile.withDefaults()
.evaluate(data, 0.25, 0.5, 0.75); // probabilities
// q ~ [25.0, 50.0, 75.0]
Probability Distributions
Overview
The commons-statistics-distribution module provides a framework and implementations for some commonly used
probability distributions. Continuous univariate distributions are represented by
implementations of the
ContinuousDistribution
interface. Discrete distributions implement
DiscreteDistribution
(values must be mapped to integers).
API
The distribution framework provides the means to compute probability density,
probability mass and cumulative probability functions for several well-known
discrete (integer-valued) and continuous probability distributions.
The API also allows for the computation of inverse cumulative probabilities
and sampling from distributions.
For an instance f of a distribution F ,
and a domain value, x , f.cumulativeProbability(x)
computes P(X <= x) where X is a random variable distributed
as F . The complement of the cumulative probability,
f.survivalProbability(x) computes P(X > x) . Note that
the survival probability is approximately equal to 1 - P(X <= x) but
does not suffer from cancellation error as the cumulative probability approaches 1.
The cancellation error may cause a (total) loss of accuracy when
P(X <= x) ~ 1
(see complementary probabilities).
TDistribution t = TDistribution.of(29);
double lowerTail = t.cumulativeProbability(-2.656); // P(T(29) <= -2.656)
double upperTail = t.survivalProbability(2.75); // P(T(29) > 2.75)
For discrete
F , the probability mass function is given by f.probability(x) .
For continuous
F , the probability density function is given by f.density(x) .
Distributions also implement f.probability(x1, x2) for computing
P(x1 < X <= x2) .
PoissonDistribution pd = PoissonDistribution.of(1.23);
double p1 = pd.probability(5);
double p2 = pd.probability(5, 5);
double p3 = pd.probability(4, 5);
// p2 == 0
// p1 == p3
Inverse distribution functions can be computed using the
inverseCumulativeProbability and inverseSurvivalProbability
methods. For continuous f and p a probability,
f.inverseCumulativeProbability(p) returns
\[ x = \begin{cases}
\inf \{ x \in \mathbb R : P(X \le x) \ge p\} & \text{for } 0 \lt p \le 1 \\
\inf \{ x \in \mathbb R : P(X \le x) \gt 0 \} & \text{for } p = 0
\end{cases} \]
where X is distributed as F .
Likewise f.inverseSurvivalProbability(p) returns
\[ x = \begin{cases}
\inf \{ x \in \mathbb R : P(X \gt x) \le p\} & \text{for } 0 \le p \lt 1 \\
\inf \{ x \in \mathbb R : P(X \gt x) \lt 1 \} & \text{for } p = 1
\end{cases} \]
NormalDistribution n = NormalDistribution.of(0, 1);
double x1 = n.inverseCumulativeProbability(1e-300);
double x2 = n.inverseSurvivalProbability(1e-300);
// x1 == -x2 ~ -37.0471
For discrete F , the definition is the same, with \( \mathbb Z \)
(the integers) in place of \( \mathbb R \). Note that, in the discrete case,
the strict inequality on \( p \) in the definition can make a difference when
\( p \) is an attained value of the distribution. For example moving to the next
larger value of \( p \) will return the value \( x + 1 \) for inverse CDF.
All distributions provide accessors for the parameters used to create the distribution,
and a mean and variance. The return value when the mean or variance
is undefined is noted in the class javadoc.
ChiSquaredDistribution chi2 = ChiSquaredDistribution.of(42);
double df = chi2.getDegreesOfFreedom(); // 42
double mean = chi2.getMean(); // 42
double variance = chi2.getVariance(); // 84
CauchyDistribution cauchy = CauchyDistribution.of(1.23, 4.56);
double location = cauchy.getLocation(); // 1.23
double scale = cauchy.getScale(); // 4.56
double undefined1 = cauchy.getMean(); // NaN
double undefined2 = cauchy.getVariance(); // NaN
The supported domain of the distribution is provided by the
getSupportLowerBound and getSupportUpperBound methods.
BinomialDistribution b = BinomialDistribution.of(13, 0.15);
int lower = b.getSupportLowerBound(); // 0
int upper = b.getSupportUpperBound(); // 13
All distributions implement a createSampler(UniformRandomProvider rng)
method to support random sampling from the distribution, where UniformRandomProvider
is an interface defined in Commons RNG.
The sampler is a functional interface whose functional method is sample() ,
suitable for generation of double or int samples.
Default samples() methods are provided to create a
DoubleStream or IntStream .
// From Commons RNG Simple
UniformRandomProvider rng = RandomSource.KISS.create(123L);
NormalDistribution n = NormalDistribution.of(0, 1);
double x = n.createSampler(rng).sample();
// Generate a number of samples
GeometricDistribution g = GeometricDistribution.of(0.75);
int[] k = g.createSampler(rng).samples(100).toArray();
// k.length == 100
Note that even when distributions are immutable, the sampler is not immutable as it
depends on the instance of the mutable UniformRandomProvider . Generation of
many samples in a multi-threaded application should use a separate instance of
UniformRandomProvider per thread. Any synchronization should be avoided
for best performance. By default the streams returned from the samples()
methods are sequential.
Implementation Details
Instances are constructed using factory methods, typically a static method in the
distribution class named of . This allows the returned instance
to be specialised to the distribution parameters.
Exceptions will be raised by the factory method when constructing the distribution
using invalid parameters. See the class javadoc for exception conditions.
Unless otherwise noted, distribution instances are immutable. This allows sharing
an instance between threads for computations.
Exceptions will not be raised by distributions for an invalid x argument
to probability functions. Typically the cumulative probability functions will return
0 or 1 for an out-of-domain argument, depending on which the side of the domain bound
the argument falls, and the density or probability mass functions return 0.
Return values for x arguments when the result is
undefined should be documented in the class javadoc. For example the beta distribution
is undefined for x = 0, alpha < 1 or x = 1, beta < 1 .
Note: This out-of-domain behaviour may be different from distributions in the
org.apache.commons.math3.distribution package. Users upgrading from
commons-math
should check the appropriate class javadoc.
An exception will be raised by distributions for an invalid p argument
to inverse probability functions. The argument must be in the range [0, 1] .
Complementary Probabilities
The distributions provide the cumulative probability p and its complement,
the survival probability, q = 1 - p . When the probability
q is small use of the cumulative probability to compute q can
result in dramatic loss of accuracy. This is due to the distribution of floating-point
numbers having a
log-uniform
distribution as the limiting distribution. There are far more
representable numbers as the probability value approaches zero than when it approaches
one.
The difference is illustrated with the result of computing the upper tail of a
probability distribution.
ChiSquaredDistribution chi2 = ChiSquaredDistribution.of(42);
double q1 = 1 - chi2.cumulativeProbability(168);
double q2 = chi2.survivalProbability(168);
// q1 == 0
// q2 != 0
In this case the value 1 - p has only a single bit of information as
x approaches 168. For example the value 1 - p(x=167)
is 2-53 (or approximately 1.11e-16 ).
The complement q retains information
much further into the long tail as shown in the following table:
Chi-squared distribution, 42 degrees of freedom |
x |
1 - p |
q |
166 |
1.11e-16 |
1.16e-16 |
167 |
1.11e-16 |
7.96e-17 |
168 |
0 |
5.43e-17 |
... |
|
|
200 |
0 |
1.19e-22 |
Probability computations should use the appropriate cumulative or survival function
to calculate the lower or upper tail respectively. The same care should be applied
when inverting probability distributions. It is preferred to compute either
p ≤ 0.5 or q ≤ 0.5 without loss of accuracy and then
invert respectively the cumulative probability using p or the survival
probabilty using q to obtain x .
ChiSquaredDistribution chi2 = ChiSquaredDistribution.of(42);
double q = 5.43e-17;
// Incorrect: p = 1 - q == 1.0 !!!
double x1 = chi2.inverseCumulativeProbability(1 - q);
// Correct: invert q
double x2 = chi2.inverseSurvivalProbability(q);
// x1 == +infinity
// x2 ~ 168.0
Note: The survival probability functions were not present in the
org.apache.commons.math3.distribution package. Users upgrading from
commons-math
should update usage of the cumulative probability functions where appropriate.
Inference
The commons-statistics-inference module provides hypothesis testing.
Overview
The module provides test classes that implement a single, or family, of statistical
tests. Each test class provides methods to compute a test statistic and a p-value for the
significance of the statistic. These can be computed together using a test
method and returned as a
SignificanceResult.
The SignificanceResult has a method that can be used to reject
the null hypothesis at the provided significance level. Test classes may extend the
SignificanceResult to return more information about the test result,
for example the computed degrees of freedom.
Alternatively a statistic method is provided to compute only the
statistic as a double value. This statistic can be compared to a pre-computed
critical value, for example from a table of critical values.
A test is obtained using the withDefaults() method to return the test with
all options set to their default value. Any test options can be configured using
property change methods to return a new instance of the test. Tests that support an
alternate hypothesis will use a two-sided test by default. Test that support multiple
p-value methods will default to an appropriate computation for the size of the input
data. Unless otherwise noted test instances are immutable.
Examples
A chi-square test that the observed counts conform to the expected frequencies.
double[] expected = {0.25, 0.5, 0.25};
long[] observed = {57, 123, 38};
SignificanceResult result = ChiSquareTest.withDefaults()
.test(expected, observed);
result.getPValue(); // 0.0316148
result.reject(0.05); // true
result.reject(0.01); // false
A paired t-test that the marks in the math exam were greater than the science
exam. This fails to reject the null hypothesis (that there was no difference) with
95% confidence.
double[] math = {53, 69, 65, 65, 67, 79, 86, 65, 62, 69}; // mean = 68.0
double[] science = {75, 65, 68, 63, 55, 65, 73, 45, 51, 52}; // mean = 61.2
SignificanceResult result = TTest.withDefaults()
.with(AlternativeHypothesis.GREATER_THAN)
.pairedTest(math, science);
result.getPValue(); // 0.05764
result.reject(0.05); // false
A G-test that the allele frequencies conform to the expected Hardy-Weinberg proportions.
This is an example of an intrinsic hypothesis where the expected frequencies are computed
using the observations and the degrees of freedom must be adjusted.
The data is from McDonald (1989) Selection component analysis
of the Mpi locus in the amphipod Platorchestia platensis.
Heredity 62: 243-249.
// Allele frequencies: Mpi 90/90, Mpi 90/100, Mpi 100/100
long[] observed = {1203, 2919, 1678};
// Mpi 90 proportion
double p = (2.0 * observed[0] + observed[1]) /
(2 * Arrays.stream(observed).sum()); // 5325 / 11600 = 0.459
// Hardy-Weinberg proportions
double[] expected = {p * p, 2 * p * (1 - p), (1 - p) * (1 - p)};
// 0.211, 0.497, 0.293
SignificanceResult result = GTest.withDefaults()
.withDegreesOfFreedomAdjustment(1)
.test(expected, observed);
result.getStatistic(); // 1.03
result.getPValue(); // 0.309
result.reject(0.05); // false
A one-way analysis of variance test. This is an example where the result has more
information than the test statistic and the p-value.
The data is from McDonald et al (1991) Allozymes and morphometric characters of
three species of Mytilus in the Northern and Southern Hemispheres.
Marine Biology 111: 323-333.
double[] tillamook = {0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735, 0.0659, 0.0923, 0.0836};
double[] newport = {0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835, 0.0725};
double[] petersburg = {0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105};
double[] magadan = {0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764, 0.0689};
double[] tvarminne = {0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045};
Collection<double[]> data = Arrays.asList(tillamook, newport, petersburg, magadan, tvarminne);
OneWayAnova.Result result = OneWayAnova.withDefaults()
.test(data);
result.getStatistic(); // 7.12
result.getPValue(); // 2.8e-4
result.reject(0.001); // true
The result also provides the between and within group degrees of freedom and the mean
squares allowing reporting of the results in a table:
|
degrees of freedom |
mean square |
F |
p |
between groups |
4 |
0.001113 |
7.12 |
2.8e-4 |
within groups |
34 |
0.000159 |
|
|
Interval
The commons-statistics-interval module provides statistical intervals.
The Interval interface provides a bounded interval with a lower and upper
bound: \( [l, u] \).
The BinomialConfidenceInterval enumeration provides methods
to create a confidence interval for a binomial proportion. This is an interval
containing the probability of success given a series of success-failure experiments.
The interval is constructed using a confidence level. For example a 95% confidence interval
will contain the true proportion of successes 95% of the times that the procedure
for constructing the confidence interval is employed. The target error rate \( \alpha \)
is defined as \( 1 - confidence \) when expressing the confidence level as a probability
in \( (0, 1) \).
The following example demonstrates an ideal coin toss experiment. Note how the 95%
confidence interval containing the true probability narrows as the number of trials
increases.
BinomialConfidenceInterval method = BinomialConfidenceInterval.WILSON_SCORE;
double alpha = 0.05;
Interval interval = method.fromErrorRate(10, 5, alpha);
interval.getLowerBound(); // 0.23659
interval.getUpperBound(); // 0.76341
method.fromErrorRate(100, 50, alpha); // 0.40383, 0.59617
method.fromErrorRate(1000, 500, alpha); // 0.46907, 0.53093
method.fromErrorRate(10000, 5000, alpha); // 0.49020, 0.50980
The NormalConfidenceInterval enumeration provides methods
to create a confidence interval for a normally distributed population.
Intervals can be created for the mean or the variance from a sample of
the population.
The following example demonstrates how to generate a 95% confidence interval
for the mean given a sample. The mean and variance of the sample are
required for the interval; here they are generated using the descriptive
statistics API.
double[] x = {
1.47, 1.40, 1.55, 1.44, 1.41,
1.38, 1.53, 1.42, 1.55, 1.55,
1.31, 1.37, 1.53, 1.47, 1.51
};
DoubleStatistics stats = DoubleStatistics.of(EnumSet.of(Statistic.MEAN, Statistic.VARIANCE), x);
double mean = stats.getAsDouble(Statistic.MEAN); // 1.46
double variance = stats.getAsDouble(Statistic.VARIANCE); // 0.0058
long n = stats.getCount(); // 15
double alpha = 0.05;
Interval interval = NormalConfidenceInterval.MEAN.fromErrorRate(mean, variance, n, alpha);
interval.getLowerBound(); // 1.4170
interval.getUpperBound(); // 1.5017
Ranking
The commons-statistics-ranking module provides rank transformations.
The NaturalRanking class provides a ranking based on the natural ordering
of floating-point values. Ranks are assigned to the input numbers in ascending order,
starting from 1.
NaturalRanking ranking = new NaturalRanking();
ranking.apply(new double[] {5, 6, 7, 8}); // 1, 2, 3, 4
ranking.apply(new double[] {8, 5, 7, 6}); // 4, 1, 3, 2
The special case of NaN values are handled using the configured
NaNStragegy . The default is to raise an exception.
double[] data = new double[] {6, 5, Double.NaN, 7};
new NaturalRanking().apply(data); // IllegalArgumentException
new NaturalRanking(NaNStrategy.MINIMAL).apply(data); // (4, 2, 1, 3)
new NaturalRanking(NaNStrategy.MAXIMAL).apply(data); // (3, 1, 4, 2)
new NaturalRanking(NaNStrategy.REMOVED).apply(data); // (3, 1, 2)
new NaturalRanking(NaNStrategy.FIXED).apply(data); // (3, 1, NaN, 2)
new NaturalRanking(NaNStrategy.FAILED).apply(data); // IllegalArgumentException
Ties are handled using the configured TiesStragegy . The default is to
use an average.
double[] data = new double[] {7, 5, 7, 6};
new NaturalRanking().apply(data); // (3.5, 1, 3.5, 2)
new NaturalRanking(TiesStrategy.SEQUENTIAL).apply(data); // (3, 1, 4, 2)
new NaturalRanking(TiesStrategy.MINIMUM).apply(data); // (3, 1, 3, 2)
new NaturalRanking(TiesStrategy.MAXIMUM).apply(data); // (4, 1, 4, 2)
new NaturalRanking(TiesStrategy.AVERAGE).apply(data); // (3.5, 1, 3.5, 2)
new NaturalRanking(TiesStrategy.RANDOM).apply(data); // (3, 1, 4, 2) or (4, 1, 3, 2)
The source of randomness defaults to a system supplied generator. The randomness can be
provided as a LongSupplier of random 64-bit values.
double[] data = new double[] {7, 5, 7, 6};
new NaturalRanking(TiesStrategy.RANDOM).apply(data);
new NaturalRanking(new SplittableRandom()::nextInt).apply(data);
// From Commons RNG
UniformRandomProvider rng = RandomSource.KISS.create();
new NaturalRanking(rng::nextInt).apply(data);
// ranks: (3, 1, 4, 2) or (4, 1, 3, 2)
|