Class ChiSquareTest


  • public class ChiSquareTest
    extends Object
    Implements Chi-Square test statistics.

    This implementation handles both known and unknown distributions.

    Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.

    • Constructor Detail

      • ChiSquareTest

        public ChiSquareTest()
        Construct a ChiSquareTest.
    • Method Detail

      • chiSquare

        public double chiSquare​(double[] expected,
                                long[] observed)
                         throws NotPositiveException,
                                NotStrictlyPositiveException,
                                DimensionMismatchException
        Computes the Chi-Square statistic comparing observed and expected frequency counts.

        This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that the observed counts follow the expected distribution.

        Preconditions:

        • Expected counts must all be positive.
        • Observed counts must all be ≥ 0.
        • The observed and expected arrays must have the same length and their common length must be at least 2.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

        Parameters:
        observed - array of observed frequency counts
        expected - array of expected frequency counts
        Returns:
        chiSquare test statistic
        Throws:
        NotPositiveException - if observed has negative entries
        NotStrictlyPositiveException - if expected has entries that are not strictly positive
        DimensionMismatchException - if the arrays length is less than 2
      • chiSquareTest

        public double chiSquareTest​(double[] expected,
                                    long[] observed)
                             throws NotPositiveException,
                                    NotStrictlyPositiveException,
                                    DimensionMismatchException,
                                    MaxCountExceededException
        Returns the observed significance level, or p-value, associated with a Chi-square goodness of fit test comparing the observed frequency counts to those in the expected array.

        The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts.

        Preconditions:

        • Expected counts must all be positive.
        • Observed counts must all be ≥ 0.
        • The observed and expected arrays must have the same length and their common length must be at least 2.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

        Parameters:
        observed - array of observed frequency counts
        expected - array of expected frequency counts
        Returns:
        p-value
        Throws:
        NotPositiveException - if observed has negative entries
        NotStrictlyPositiveException - if expected has entries that are not strictly positive
        DimensionMismatchException - if the arrays length is less than 2
        MaxCountExceededException - if an error occurs computing the p-value
      • chiSquareTest

        public boolean chiSquareTest​(double[] expected,
                                     long[] observed,
                                     double alpha)
                              throws NotPositiveException,
                                     NotStrictlyPositiveException,
                                     DimensionMismatchException,
                                     OutOfRangeException,
                                     MaxCountExceededException
        Performs a Chi-square goodness of fit test evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

        Example:
        To test the hypothesis that observed follows expected at the 99% level, use

        chiSquareTest(expected, observed, 0.01)

        Preconditions:

        • Expected counts must all be positive.
        • Observed counts must all be ≥ 0.
        • The observed and expected arrays must have the same length and their common length must be at least 2.
        • 0 < alpha < 0.5

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

        Parameters:
        observed - array of observed frequency counts
        expected - array of expected frequency counts
        alpha - significance level of the test
        Returns:
        true iff null hypothesis can be rejected with confidence 1 - alpha
        Throws:
        NotPositiveException - if observed has negative entries
        NotStrictlyPositiveException - if expected has entries that are not strictly positive
        DimensionMismatchException - if the arrays length is less than 2
        OutOfRangeException - if alpha is not in the range (0, 0.5]
        MaxCountExceededException - if an error occurs computing the p-value
      • chiSquare

        public double chiSquare​(long[][] counts)
                         throws NullArgumentException,
                                NotPositiveException,
                                DimensionMismatchException
        Computes the Chi-Square statistic associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.

        The rows of the 2-way table are count[0], ... , count[count.length - 1]

        Preconditions:

        • All counts must be ≥ 0.
        • The sum of each row and column must be > 0.
        • The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
        • The 2-way table represented by counts must have at least 2 columns and at least 2 rows.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        If a column or row contains only zeros this is invalid input and a ZeroException is thrown. The empty column/row should be removed from the input counts.

        Parameters:
        counts - array representation of 2-way table
        Returns:
        chiSquare test statistic
        Throws:
        NullArgumentException - if the array is null
        DimensionMismatchException - if the array is not rectangular
        NotPositiveException - if counts has negative entries
        ZeroException - if the sum of a row or column is zero
      • chiSquareTest

        public double chiSquareTest​(long[][] counts)
                             throws NullArgumentException,
                                    DimensionMismatchException,
                                    NotPositiveException,
                                    MaxCountExceededException
        Returns the observed significance level, or p-value, associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.

        The rows of the 2-way table are count[0], ... , count[count.length - 1]

        Preconditions:

        • All counts must be ≥ 0.
        • The sum of each row and column must be > 0.
        • The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
        • The 2-way table represented by counts must have at least 2 columns and at least 2 rows.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        If a column or row contains only zeros this is invalid input and a ZeroException is thrown. The empty column/row should be removed from the input counts.

        Parameters:
        counts - array representation of 2-way table
        Returns:
        p-value
        Throws:
        NullArgumentException - if the array is null
        DimensionMismatchException - if the array is not rectangular
        NotPositiveException - if counts has negative entries
        MaxCountExceededException - if an error occurs computing the p-value
        ZeroException - if the sum of a row or column is zero
      • chiSquareTest

        public boolean chiSquareTest​(long[][] counts,
                                     double alpha)
                              throws NullArgumentException,
                                     DimensionMismatchException,
                                     NotPositiveException,
                                     OutOfRangeException,
                                     MaxCountExceededException
        Performs a chi-square test of independence evaluating the null hypothesis that the classifications represented by the counts in the columns of the input 2-way table are independent of the rows, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

        The rows of the 2-way table are count[0], ... , count[count.length - 1]

        Example:
        To test the null hypothesis that the counts in count[0], ... , count[count.length - 1] all correspond to the same underlying probability distribution at the 99% level, use

        chiSquareTest(counts, 0.01)

        Preconditions:

        • All counts must be ≥ 0.
        • The sum of each row and column must be > 0.
        • The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
        • The 2-way table represented by counts must have at least 2 columns and at least 2 rows.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        If a column or row contains only zeros this is invalid input and a ZeroException is thrown. The empty column/row should be removed from the input counts.

        Parameters:
        counts - array representation of 2-way table
        alpha - significance level of the test
        Returns:
        true iff null hypothesis can be rejected with confidence 1 - alpha
        Throws:
        NullArgumentException - if the array is null
        DimensionMismatchException - if the array is not rectangular
        NotPositiveException - if counts has any negative entries
        OutOfRangeException - if alpha is not in the range (0, 0.5]
        MaxCountExceededException - if an error occurs computing the p-value
        ZeroException - if the sum of a row or column is zero
      • chiSquareDataSetsComparison

        public double chiSquareDataSetsComparison​(long[] observed1,
                                                  long[] observed2)
                                           throws DimensionMismatchException,
                                                  NotPositiveException,
                                                  ZeroException

        Computes a Chi-Square two sample test statistic comparing bin frequency counts in observed1 and observed2. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is

        ∑[(K * observed1[i] - observed2[i]/K)2 / (observed1[i] + observed2[i])] where
        K = √[∑(observed2 / ∑(observed1)]

        This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that both observed counts follow the same distribution.

        Preconditions:

        • Observed counts must be non-negative.
        • Observed counts for a specific bin must not both be zero.
        • Observed counts for a specific sample must not all be 0.
        • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Parameters:
        observed1 - array of observed frequency counts of the first data set
        observed2 - array of observed frequency counts of the second data set
        Returns:
        chiSquare test statistic
        Throws:
        DimensionMismatchException - the length of the arrays does not match
        NotPositiveException - if any entries in observed1 or observed2 are negative
        ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at some index is zero for both arrays
        Since:
        1.2
      • chiSquareTestDataSetsComparison

        public double chiSquareTestDataSetsComparison​(long[] observed1,
                                                      long[] observed2)
                                               throws DimensionMismatchException,
                                                      NotPositiveException,
                                                      ZeroException,
                                                      MaxCountExceededException

        Returns the observed significance level, or p-value, associated with a Chi-Square two sample test comparing bin frequency counts in observed1 and observed2.

        The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.

        See chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the test statistic. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

        Preconditions:
        • Observed counts must be non-negative.
        • Observed counts for a specific bin must not both be zero.
        • Observed counts for a specific sample must not all be 0.
        • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Parameters:
        observed1 - array of observed frequency counts of the first data set
        observed2 - array of observed frequency counts of the second data set
        Returns:
        p-value
        Throws:
        DimensionMismatchException - the length of the arrays does not match
        NotPositiveException - if any entries in observed1 or observed2 are negative
        ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at the same index is zero for both arrays
        MaxCountExceededException - if an error occurs computing the p-value
        Since:
        1.2
      • chiSquareTestDataSetsComparison

        public boolean chiSquareTestDataSetsComparison​(long[] observed1,
                                                       long[] observed2,
                                                       double alpha)
                                                throws DimensionMismatchException,
                                                       NotPositiveException,
                                                       ZeroException,
                                                       OutOfRangeException,
                                                       MaxCountExceededException

        Performs a Chi-Square two sample test comparing two binned data sets. The test evaluates the null hypothesis that the two lists of observed counts conform to the same frequency distribution, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

        See chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the Chisquare statistic used in the test. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

        Preconditions:
        • Observed counts must be non-negative.
        • Observed counts for a specific bin must not both be zero.
        • Observed counts for a specific sample must not all be 0.
        • The arrays observed1 and observed2 must have the same length and their common length must be at least 2.
        • 0 < alpha < 0.5

        If any of the preconditions are not met, an IllegalArgumentException is thrown.

        Parameters:
        observed1 - array of observed frequency counts of the first data set
        observed2 - array of observed frequency counts of the second data set
        alpha - significance level of the test
        Returns:
        true iff null hypothesis can be rejected with confidence 1 - alpha
        Throws:
        DimensionMismatchException - the length of the arrays does not match
        NotPositiveException - if any entries in observed1 or observed2 are negative
        ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at the same index is zero for both arrays
        OutOfRangeException - if alpha is not in the range (0, 0.5]
        MaxCountExceededException - if an error occurs performing the test
        Since:
        1.2