Class Percentile
- java.lang.Object
-
- org.apache.commons.math4.legacy.stat.descriptive.AbstractUnivariateStatistic
-
- org.apache.commons.math4.legacy.stat.descriptive.rank.Percentile
-
- All Implemented Interfaces:
MathArrays.Function,UnivariateStatistic
- Direct Known Subclasses:
Median
public class Percentile extends AbstractUnivariateStatistic
Provides percentile computation.There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:
- Let
nbe the length of the (sorted) array and0 < p <= 100be the desired percentile. - If
n = 1return the unique array element (regardless of the value ofp); otherwise - Compute the estimated percentile position
pos = p * (n + 1) / 100and the difference,dbetweenposandfloor(pos)(i.e. the fractional part ofpos). - If
pos < 1return the smallest element in the array. - Else if
pos >= nreturn the largest element in the array. - Else let
lowerbe the element in positionfloor(pos)in the array and letupperbe the next element in the array. Returnlower + d * (upper - lower)
To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by
Arrays.sort(double[])is the one determined byDouble.compareTo(Double). This ordering makesDouble.NaNlarger than any other value (includingDouble.POSITIVE_INFINITY). Therefore, for example, the median (50th percentile) of{0, 1, 2, 3, 4, Double.NaN}evaluates to2.5.Since percentile estimation usually involves interpolation between array elements, arrays containing
NaNor infinite values will often result inNaNor infinite values returned.Further, to include different estimation types such as R1, R2 as mentioned in Quantile page(wikipedia), a type specific NaN handling strategy is used to closely match with the typically observed results from popular tools like R(R1-R9), Excel(R7).
Since 2.2, Percentile uses only selection instead of complete sorting and caches selection algorithm state between calls to the various
evaluatemethods. This greatly improves efficiency, both for a single percentile and multiple percentile computations. To maximize performance when multiple percentiles are computed based on the same data, users should set the data array once using either one of theevaluate(double[], double)orsetData(double[])methods and thereafterevaluate(double)with just the percentile provided.Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the
increment()orclear()method, it must be synchronized externally.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPercentile.EstimationTypeAn enum for various estimation strategies of a percentile referred in wikipedia on quantile with the names of enum matching those of types mentioned in wikipedia.
-
Constructor Summary
Constructors Modifier Constructor Description Percentile()Constructs a Percentile with the following defaults.Percentile(double quantile)Constructs a Percentile with the specific quantile value and the following.protectedPercentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector)Constructs a Percentile with the specific quantile value,Percentile.EstimationType,NaNStrategyandKthSelector.Percentile(Percentile original)Copy constructor, creates a newPercentileidentical.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Percentilecopy()Returns a copy of the statistic with the same internal state.doubleevaluate(double p)Returns the result of evaluating the statistic over the stored data.doubleevaluate(double[] values, double p)Returns an estimate of thepth percentile of the values in thevaluesarray.doubleevaluate(double[] values, double[] sampleWeights, double p)Returns an estimate of thepth percentile of the values in thevaluesarray with their weights.doubleevaluate(double[] values, double[] sampleWeights, int start, int length)Returns an estimate of the weightedquantileth percentile of the designated values in thevaluesarray.doubleevaluate(double[] values, double[] sampleWeights, int begin, int length, double p)Returns an estimate of thepth percentile of the values in thevaluesarray withsampleWeights, starting with the element in (0-based) positionbeginin the array and includinglengthvalues.doubleevaluate(double[] values, int start, int length)Returns an estimate of thequantileth percentile of the designated values in thevaluesarray.doubleevaluate(double[] values, int begin, int length, double p)Returns an estimate of thepth percentile of the values in thevaluesarray, starting with the element in (0-based) positionbeginin the array and includinglengthvalues.Percentile.EstimationTypegetEstimationType()Get the estimationtypeused for computation.KthSelectorgetKthSelector()Get thekthSelectorused for computation.NaNStrategygetNaNStrategy()Get theNaN Handlingstrategy used for computation.PivotingStrategygetPivotingStrategy()Get thePivotingStrategyused in KthSelector for computation.doublegetQuantile()Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).protected double[]getWorkArray(double[] values, double[] sampleWeights, int begin, int length)Get the work arrays of weights to operate.voidsetData(double[] values)Set the data array.voidsetData(double[] values, double[] sampleWeights)Set the data array.voidsetData(double[] values, double[] sampleWeights, int begin, int length)Set the data and weights arrays.voidsetData(double[] values, int begin, int length)Set the data array.voidsetQuantile(double p)Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).PercentilewithEstimationType(Percentile.EstimationType newEstimationType)Build a new instance similar to the current one except for theestimation type.PercentilewithKthSelector(KthSelector newKthSelector)Build a new instance similar to the current one except for thekthSelectorinstance specifically set.PercentilewithNaNStrategy(NaNStrategy newNaNStrategy)Build a new instance similar to the current one except for theNaN handlingstrategy.-
Methods inherited from class org.apache.commons.math4.legacy.stat.descriptive.AbstractUnivariateStatistic
evaluate, evaluate, getData, getDataRef
-
-
-
-
Constructor Detail
-
Percentile
public Percentile()
Constructs a Percentile with the following defaults.- default quantile: 50.0, can be reset with
setQuantile(double) - default estimation type:
Percentile.EstimationType.LEGACY, can be reset withwithEstimationType(EstimationType) - default NaN strategy:
NaNStrategy.REMOVED, can be reset withwithNaNStrategy(NaNStrategy) - a KthSelector that makes use of
MedianOf3PivotingStrategy, can be reset withwithKthSelector(KthSelector)
- default quantile: 50.0, can be reset with
-
Percentile
public Percentile(double quantile)
Constructs a Percentile with the specific quantile value and the following.- default method type:
Percentile.EstimationType.LEGACY - default NaN strategy:
NaNStrategy.REMOVED - a Kth Selector :
KthSelector
- Parameters:
quantile- the quantile- Throws:
MathIllegalArgumentException- if p is not greater than 0 and less than or equal to 100
- default method type:
-
Percentile
public Percentile(Percentile original)
Copy constructor, creates a newPercentileidentical. to theoriginal- Parameters:
original- thePercentileinstance to copy. Cannot benull.
-
Percentile
protected Percentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector)
Constructs a Percentile with the specific quantile value,Percentile.EstimationType,NaNStrategyandKthSelector.- Parameters:
quantile- the quantile to be computedestimationType- one of the percentileestimation typesnanStrategy- one ofNaNStrategyto handle with NaNs. Cannot benull.kthSelector- aKthSelectorto use for pivoting during search- Throws:
MathIllegalArgumentException- if p is not within (0,100]
-
-
Method Detail
-
setData
public void setData(double[] values)
Set the data array.The stored value is a copy of the parameter array, not the array itself.
- Overrides:
setDatain classAbstractUnivariateStatistic- Parameters:
values- data array to store (may be null to remove stored data)- See Also:
AbstractUnivariateStatistic.evaluate()
-
setData
public void setData(double[] values, double[] sampleWeights)
Set the data array.- Parameters:
values- Data array. Cannot benull.sampleWeights- corresponding positive and non-NaN weights. Cannot benull.- Throws:
MathIllegalArgumentException- if lengths of values and weights are not equal.NotANumberException- if any weight is NaNNotStrictlyPositiveException- if any weight is not positive
-
setData
public void setData(double[] values, int begin, int length)
Set the data array. The input array is copied, not referenced.- Overrides:
setDatain classAbstractUnivariateStatistic- Parameters:
values- data array to storebegin- the index of the first element to includelength- the number of elements to include- See Also:
AbstractUnivariateStatistic.evaluate()
-
setData
public void setData(double[] values, double[] sampleWeights, int begin, int length)
Set the data and weights arrays. The input array is copied, not referenced.- Parameters:
values- Data array. Cannot benull.sampleWeights- corresponding positive and non-NaN weights. Cannot benull.begin- the index of the first element to includelength- the number of elements to include- Throws:
MathIllegalArgumentException- if lengths of values and weights are not equal or values or weights is nullNotPositiveException- if begin or length is not positiveNumberIsTooLargeException- if begin + length is greater than values.lengthNotANumberException- if any weight is NaNNotStrictlyPositiveException- if any weight is not positive
-
evaluate
public double evaluate(double p)
Returns the result of evaluating the statistic over the stored data. If weights have been set, it will compute weighted percentile.The stored array is the one which was set by previous calls to
setData(double[])orsetData(double[], double[], int, int)- Parameters:
p- the percentile value to compute- Returns:
- the value of the statistic applied to the stored data
- Throws:
MathIllegalArgumentException- if lengths of values and weights are not equal or values or weights is nullNotPositiveException- if begin, length is negativeNotStrictlyPositiveException- if any weight is not positiveNotANumberException- if any weight is NaNOutOfRangeException- if p is invalidNumberIsTooLargeException- if begin + length is greater than values.length (p must be greater than 0 and less than or equal to 100)
-
evaluate
public double evaluate(double[] values, double p)
Returns an estimate of thepth percentile of the values in thevaluesarray.Calls to this method do not modify the internal
quantilestate of this statistic.- Returns
Double.NaNifvalueshas length0 - Returns (for any value of
p)values[0]ifvalueshas length1 - Throws
MathIllegalArgumentExceptionifvaluesis null or p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)
See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- input array of valuesp- the percentile value to compute- Returns:
- the percentile value or Double.NaN if the array is empty
- Throws:
MathIllegalArgumentException- ifvaluesis null or p is invalid
- Returns
-
evaluate
public double evaluate(double[] values, double[] sampleWeights, double p)
Returns an estimate of thepth percentile of the values in thevaluesarray with their weights.See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- input array of valuessampleWeights- weights of valuesp- the percentile value to compute- Returns:
- the weighted percentile value or Double.NaN if the array is empty
- Throws:
MathIllegalArgumentException- if lengths of values and weights are not equal or values or weights is nullNotPositiveException- if begin, length is negativeNotStrictlyPositiveException- if any weight is not positiveNotANumberException- if any weight is NaNOutOfRangeException- if p is invalidNumberIsTooLargeException- if begin + length is greater than values.length
-
evaluate
public double evaluate(double[] values, int start, int length)
Returns an estimate of thequantileth percentile of the designated values in thevaluesarray. The quantile estimated is determined by thequantileproperty.- Returns
Double.NaNiflength = 0 - Returns (for any value of
quantile)values[begin]iflength = 1 - Throws
MathIllegalArgumentExceptionifvaluesis null, orstartorlengthis invalid
See
Percentilefor a description of the percentile estimation algorithm used.- Specified by:
evaluatein interfaceMathArrays.Function- Specified by:
evaluatein interfaceUnivariateStatistic- Specified by:
evaluatein classAbstractUnivariateStatistic- Parameters:
values- the input arraystart- index of the first array element to includelength- the number of elements to include- Returns:
- the percentile value
- Throws:
MathIllegalArgumentException- if the parameters are not valid
- Returns
-
evaluate
public double evaluate(double[] values, double[] sampleWeights, int start, int length)
Returns an estimate of the weightedquantileth percentile of the designated values in thevaluesarray. The quantile estimated is determined by thequantileproperty.See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- the input arraysampleWeights- the weights of valuesstart- index of the first array element to includelength- the number of elements to include- Returns:
- the percentile value
- Throws:
MathIllegalArgumentException- if lengths of values and weights are not equal or values or weights is nullNotPositiveException- if begin, length is negativeNotStrictlyPositiveException- if any weight is not positiveNotANumberException- if any weight is NaNOutOfRangeException- if p is invalidNumberIsTooLargeException- if begin + length is greater than values.length
-
evaluate
public double evaluate(double[] values, int begin, int length, double p)
Returns an estimate of thepth percentile of the values in thevaluesarray, starting with the element in (0-based) positionbeginin the array and includinglengthvalues.Calls to this method do not modify the internal
quantilestate of this statistic.- Returns
Double.NaNiflength = 0 - Returns (for any value of
p)values[begin]iflength = 1 - Throws
MathIllegalArgumentExceptionifvaluesis null ,beginorlengthis invalid, orpis not a valid quantile value (p must be greater than 0 and less than or equal to 100)
See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- array of input valuesp- the percentile to computebegin- the first (0-based) element to include in the computationlength- the number of array elements to include- Returns:
- the percentile value.
- Throws:
MathIllegalArgumentException- if the parameters are not valid.
- Returns
-
evaluate
public double evaluate(double[] values, double[] sampleWeights, int begin, int length, double p)
Returns an estimate of thepth percentile of the values in thevaluesarray withsampleWeights, starting with the element in (0-based) positionbeginin the array and includinglengthvalues.See
Percentilefor a description of the percentile estimation algorithm used.- Parameters:
values- array of input valuessampleWeights- positive and non-NaN weights of valuesbegin- the first (0-based) element to include in the computationlength- the number of array elements to includep- percentile to compute- Returns:
- the weighted percentile value
- Throws:
MathIllegalArgumentException- if lengths of values and weights are not equal or values or weights is nullNotPositiveException- if begin, length is negativeNotStrictlyPositiveException- if any weight is not positiveNotANumberException- if any weight is NaNOutOfRangeException- if p is invalidNumberIsTooLargeException- if begin + length is greater than values.length
-
getQuantile
public double getQuantile()
Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).- Returns:
- quantile set while construction or
setQuantile(double)
-
setQuantile
public void setQuantile(double p)
Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).- Parameters:
p- a value between 0 < p <= 100- Throws:
MathIllegalArgumentException- if p is not greater than 0 and less than or equal to 100
-
copy
public Percentile copy()
Returns a copy of the statistic with the same internal state.- Specified by:
copyin interfaceUnivariateStatistic- Specified by:
copyin classAbstractUnivariateStatistic- Returns:
- a copy of the statistic
-
getWorkArray
protected double[] getWorkArray(double[] values, double[] sampleWeights, int begin, int length)
Get the work arrays of weights to operate.- Parameters:
values- the array of numberssampleWeights- the array of weightsbegin- index to start reading the arraylength- the length of array to be read from the begin index- Returns:
- work array sliced from values in the range [begin,begin+length)
-
getEstimationType
public Percentile.EstimationType getEstimationType()
Get the estimationtypeused for computation.- Returns:
- the
estimationTypeset
-
withEstimationType
public Percentile withEstimationType(Percentile.EstimationType newEstimationType)
Build a new instance similar to the current one except for theestimation type.This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);If any of the
withXxxmethod is omitted, the default value for the corresponding customization parameter will be used.- Parameters:
newEstimationType- estimation type for the new instance. Cannot benull.- Returns:
- a new instance, with changed estimation type
-
getNaNStrategy
public NaNStrategy getNaNStrategy()
Get theNaN Handlingstrategy used for computation.- Returns:
NaN Handlingstrategy set during construction
-
withNaNStrategy
public Percentile withNaNStrategy(NaNStrategy newNaNStrategy)
Build a new instance similar to the current one except for theNaN handlingstrategy.This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);If any of the
withXxxmethod is omitted, the default value for the corresponding customization parameter will be used.- Parameters:
newNaNStrategy- NaN strategy for the new instance. Cannot benull.- Returns:
- a new instance, with changed NaN handling strategy
-
getKthSelector
public KthSelector getKthSelector()
Get thekthSelectorused for computation.- Returns:
- the
kthSelectorset
-
getPivotingStrategy
public PivotingStrategy getPivotingStrategy()
Get thePivotingStrategyused in KthSelector for computation.- Returns:
- the pivoting strategy set
-
withKthSelector
public Percentile withKthSelector(KthSelector newKthSelector)
Build a new instance similar to the current one except for thekthSelectorinstance specifically set.This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(newKthSelector);If any of the
withXxxmethod is omitted, the default value for the corresponding customization parameter will be used.- Parameters:
newKthSelector- KthSelector for the new instance. Cannot benull.- Returns:
- a new instance, with changed KthSelector
-
-