Class Percentile

  • All Implemented Interfaces:
    MathArrays.Function, UnivariateStatistic
    Direct Known Subclasses:
    Median

    public class Percentile
    extends AbstractUnivariateStatistic
    Provides percentile computation.

    There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:

    1. Let n be the length of the (sorted) array and 0 < p <= 100 be the desired percentile.
    2. If n = 1 return the unique array element (regardless of the value of p); otherwise
    3. Compute the estimated percentile position pos = p * (n + 1) / 100 and the difference, d between pos and floor(pos) (i.e. the fractional part of pos).
    4. If pos < 1 return the smallest element in the array.
    5. Else if pos >= n return the largest element in the array.
    6. Else let lower be the element in position floor(pos) in the array and let upper be the next element in the array. Return lower + d * (upper - lower)

    To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[]) is the one determined by Double.compareTo(Double). This ordering makes Double.NaN larger than any other value (including Double.POSITIVE_INFINITY). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.

    Since percentile estimation usually involves interpolation between array elements, arrays containing NaN or infinite values will often result in NaN or infinite values returned.

    Further, to include different estimation types such as R1, R2 as mentioned in Quantile page(wikipedia), a type specific NaN handling strategy is used to closely match with the typically observed results from popular tools like R(R1-R9), Excel(R7).

    Since 2.2, Percentile uses only selection instead of complete sorting and caches selection algorithm state between calls to the various evaluate methods. This greatly improves efficiency, both for a single percentile and multiple percentile computations. To maximize performance when multiple percentiles are computed based on the same data, users should set the data array once using either one of the evaluate(double[], double) or setData(double[]) methods and thereafter evaluate(double) with just the percentile provided.

    Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment() or clear() method, it must be synchronized externally.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Percentile.EstimationType
      An enum for various estimation strategies of a percentile referred in wikipedia on quantile with the names of enum matching those of types mentioned in wikipedia.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Percentile copy()
      Returns a copy of the statistic with the same internal state.
      double evaluate​(double p)
      Returns the result of evaluating the statistic over the stored data.
      double evaluate​(double[] values, double p)
      Returns an estimate of the pth percentile of the values in the values array.
      double evaluate​(double[] values, double[] sampleWeights, double p)
      Returns an estimate of the pth percentile of the values in the values array with their weights.
      double evaluate​(double[] values, double[] sampleWeights, int start, int length)
      Returns an estimate of the weighted quantileth percentile of the designated values in the values array.
      double evaluate​(double[] values, double[] sampleWeights, int begin, int length, double p)
      Returns an estimate of the pth percentile of the values in the values array with sampleWeights, starting with the element in (0-based) position begin in the array and including length values.
      double evaluate​(double[] values, int start, int length)
      Returns an estimate of the quantileth percentile of the designated values in the values array.
      double evaluate​(double[] values, int begin, int length, double p)
      Returns an estimate of the pth percentile of the values in the values array, starting with the element in (0-based) position begin in the array and including length values.
      Percentile.EstimationType getEstimationType()
      Get the estimation type used for computation.
      KthSelector getKthSelector()
      Get the kthSelector used for computation.
      NaNStrategy getNaNStrategy()
      Get the NaN Handling strategy used for computation.
      PivotingStrategy getPivotingStrategy()
      Get the PivotingStrategy used in KthSelector for computation.
      double getQuantile()
      Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).
      protected double[] getWorkArray​(double[] values, double[] sampleWeights, int begin, int length)
      Get the work arrays of weights to operate.
      void setData​(double[] values)
      Set the data array.
      void setData​(double[] values, double[] sampleWeights)
      Set the data array.
      void setData​(double[] values, double[] sampleWeights, int begin, int length)
      Set the data and weights arrays.
      void setData​(double[] values, int begin, int length)
      Set the data array.
      void setQuantile​(double p)
      Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).
      Percentile withEstimationType​(Percentile.EstimationType newEstimationType)
      Build a new instance similar to the current one except for the estimation type.
      Percentile withKthSelector​(KthSelector newKthSelector)
      Build a new instance similar to the current one except for the kthSelector instance specifically set.
      Percentile withNaNStrategy​(NaNStrategy newNaNStrategy)
      Build a new instance similar to the current one except for the NaN handling strategy.
    • Method Detail

      • setData

        public void setData​(double[] values,
                            double[] sampleWeights)
        Set the data array.
        Parameters:
        values - Data array. Cannot be null.
        sampleWeights - corresponding positive and non-NaN weights. Cannot be null.
        Throws:
        MathIllegalArgumentException - if lengths of values and weights are not equal.
        NotANumberException - if any weight is NaN
        NotStrictlyPositiveException - if any weight is not positive
      • setData

        public void setData​(double[] values,
                            double[] sampleWeights,
                            int begin,
                            int length)
        Set the data and weights arrays. The input array is copied, not referenced.
        Parameters:
        values - Data array. Cannot be null.
        sampleWeights - corresponding positive and non-NaN weights. Cannot be null.
        begin - the index of the first element to include
        length - the number of elements to include
        Throws:
        MathIllegalArgumentException - if lengths of values and weights are not equal or values or weights is null
        NotPositiveException - if begin or length is not positive
        NumberIsTooLargeException - if begin + length is greater than values.length
        NotANumberException - if any weight is NaN
        NotStrictlyPositiveException - if any weight is not positive
      • evaluate

        public double evaluate​(double[] values,
                               double p)
        Returns an estimate of the pth percentile of the values in the values array.

        Calls to this method do not modify the internal quantile state of this statistic.

        • Returns Double.NaN if values has length 0
        • Returns (for any value of p) values[0] if values has length 1
        • Throws MathIllegalArgumentException if values is null or p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)

        See Percentile for a description of the percentile estimation algorithm used.

        Parameters:
        values - input array of values
        p - the percentile value to compute
        Returns:
        the percentile value or Double.NaN if the array is empty
        Throws:
        MathIllegalArgumentException - if values is null or p is invalid
      • evaluate

        public double evaluate​(double[] values,
                               double[] sampleWeights,
                               double p)
        Returns an estimate of the pth percentile of the values in the values array with their weights.

        See Percentile for a description of the percentile estimation algorithm used.

        Parameters:
        values - input array of values
        sampleWeights - weights of values
        p - the percentile value to compute
        Returns:
        the weighted percentile value or Double.NaN if the array is empty
        Throws:
        MathIllegalArgumentException - if lengths of values and weights are not equal or values or weights is null
        NotPositiveException - if begin, length is negative
        NotStrictlyPositiveException - if any weight is not positive
        NotANumberException - if any weight is NaN
        OutOfRangeException - if p is invalid
        NumberIsTooLargeException - if begin + length is greater than values.length
      • evaluate

        public double evaluate​(double[] values,
                               int start,
                               int length)
        Returns an estimate of the quantileth percentile of the designated values in the values array. The quantile estimated is determined by the quantile property.
        • Returns Double.NaN if length = 0
        • Returns (for any value of quantile) values[begin] if length = 1
        • Throws MathIllegalArgumentException if values is null, or start or length is invalid

        See Percentile for a description of the percentile estimation algorithm used.

        Specified by:
        evaluate in interface MathArrays.Function
        Specified by:
        evaluate in interface UnivariateStatistic
        Specified by:
        evaluate in class AbstractUnivariateStatistic
        Parameters:
        values - the input array
        start - index of the first array element to include
        length - the number of elements to include
        Returns:
        the percentile value
        Throws:
        MathIllegalArgumentException - if the parameters are not valid
      • evaluate

        public double evaluate​(double[] values,
                               double[] sampleWeights,
                               int start,
                               int length)
        Returns an estimate of the weighted quantileth percentile of the designated values in the values array. The quantile estimated is determined by the quantile property.

        See Percentile for a description of the percentile estimation algorithm used.

        Parameters:
        values - the input array
        sampleWeights - the weights of values
        start - index of the first array element to include
        length - the number of elements to include
        Returns:
        the percentile value
        Throws:
        MathIllegalArgumentException - if lengths of values and weights are not equal or values or weights is null
        NotPositiveException - if begin, length is negative
        NotStrictlyPositiveException - if any weight is not positive
        NotANumberException - if any weight is NaN
        OutOfRangeException - if p is invalid
        NumberIsTooLargeException - if begin + length is greater than values.length
      • evaluate

        public double evaluate​(double[] values,
                               int begin,
                               int length,
                               double p)
        Returns an estimate of the pth percentile of the values in the values array, starting with the element in (0-based) position begin in the array and including length values.

        Calls to this method do not modify the internal quantile state of this statistic.

        • Returns Double.NaN if length = 0
        • Returns (for any value of p) values[begin] if length = 1
        • Throws MathIllegalArgumentException if values is null , begin or length is invalid, or p is not a valid quantile value (p must be greater than 0 and less than or equal to 100)

        See Percentile for a description of the percentile estimation algorithm used.

        Parameters:
        values - array of input values
        p - the percentile to compute
        begin - the first (0-based) element to include in the computation
        length - the number of array elements to include
        Returns:
        the percentile value.
        Throws:
        MathIllegalArgumentException - if the parameters are not valid.
      • evaluate

        public double evaluate​(double[] values,
                               double[] sampleWeights,
                               int begin,
                               int length,
                               double p)
        Returns an estimate of the pth percentile of the values in the values array with sampleWeights, starting with the element in (0-based) position begin in the array and including length values.

        See Percentile for a description of the percentile estimation algorithm used.

        Parameters:
        values - array of input values
        sampleWeights - positive and non-NaN weights of values
        begin - the first (0-based) element to include in the computation
        length - the number of array elements to include
        p - percentile to compute
        Returns:
        the weighted percentile value
        Throws:
        MathIllegalArgumentException - if lengths of values and weights are not equal or values or weights is null
        NotPositiveException - if begin, length is negative
        NotStrictlyPositiveException - if any weight is not positive
        NotANumberException - if any weight is NaN
        OutOfRangeException - if p is invalid
        NumberIsTooLargeException - if begin + length is greater than values.length
      • getQuantile

        public double getQuantile()
        Returns the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).
        Returns:
        quantile set while construction or setQuantile(double)
      • setQuantile

        public void setQuantile​(double p)
        Sets the value of the quantile field (determines what percentile is computed when evaluate() is called with no quantile argument).
        Parameters:
        p - a value between 0 < p <= 100
        Throws:
        MathIllegalArgumentException - if p is not greater than 0 and less than or equal to 100
      • getWorkArray

        protected double[] getWorkArray​(double[] values,
                                        double[] sampleWeights,
                                        int begin,
                                        int length)
        Get the work arrays of weights to operate.
        Parameters:
        values - the array of numbers
        sampleWeights - the array of weights
        begin - index to start reading the array
        length - the length of array to be read from the begin index
        Returns:
        work array sliced from values in the range [begin,begin+length)
      • withEstimationType

        public Percentile withEstimationType​(Percentile.EstimationType newEstimationType)
        Build a new instance similar to the current one except for the estimation type.

        This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:

           Percentile customized = new Percentile(quantile).
                                   withEstimationType(estimationType).
                                   withNaNStrategy(nanStrategy).
                                   withKthSelector(kthSelector);
         

        If any of the withXxx method is omitted, the default value for the corresponding customization parameter will be used.

        Parameters:
        newEstimationType - estimation type for the new instance. Cannot be null.
        Returns:
        a new instance, with changed estimation type
      • withNaNStrategy

        public Percentile withNaNStrategy​(NaNStrategy newNaNStrategy)
        Build a new instance similar to the current one except for the NaN handling strategy.

        This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:

           Percentile customized = new Percentile(quantile).
                                   withEstimationType(estimationType).
                                   withNaNStrategy(nanStrategy).
                                   withKthSelector(kthSelector);
         

        If any of the withXxx method is omitted, the default value for the corresponding customization parameter will be used.

        Parameters:
        newNaNStrategy - NaN strategy for the new instance. Cannot be null.
        Returns:
        a new instance, with changed NaN handling strategy
      • withKthSelector

        public Percentile withKthSelector​(KthSelector newKthSelector)
        Build a new instance similar to the current one except for the kthSelector instance specifically set.

        This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:

           Percentile customized = new Percentile(quantile).
                                   withEstimationType(estimationType).
                                   withNaNStrategy(nanStrategy).
                                   withKthSelector(newKthSelector);
         

        If any of the withXxx method is omitted, the default value for the corresponding customization parameter will be used.

        Parameters:
        newKthSelector - KthSelector for the new instance. Cannot be null.
        Returns:
        a new instance, with changed KthSelector