Class ChiSquareTest
 java.lang.Object

 org.apache.commons.math4.legacy.stat.inference.ChiSquareTest

public class ChiSquareTest extends Object
Implements ChiSquare test statistics.This implementation handles both known and unknown distributions.
Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.


Constructor Summary
Constructors Constructor Description ChiSquareTest()
Construct a ChiSquareTest.

Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
chiSquare(double[] expected, long[] observed)
double
chiSquare(long[][] counts)
Computes the ChiSquare statistic associated with a chisquare test of independence based on the inputcounts
array, viewed as a twoway table.double
chiSquareDataSetsComparison(long[] observed1, long[] observed2)
Computes a ChiSquare two sample test statistic comparing bin frequency counts inobserved1
andobserved2
.double
chiSquareTest(double[] expected, long[] observed)
Returns the observed significance level, or pvalue, associated with a Chisquare goodness of fit test comparing theobserved
frequency counts to those in theexpected
array.boolean
chiSquareTest(double[] expected, long[] observed, double alpha)
Performs a Chisquare goodness of fit test evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance levelalpha
.double
chiSquareTest(long[][] counts)
Returns the observed significance level, or pvalue, associated with a chisquare test of independence based on the inputcounts
array, viewed as a twoway table.boolean
chiSquareTest(long[][] counts, double alpha)
Performs a chisquare test of independence evaluating the null hypothesis that the classifications represented by the counts in the columns of the input 2way table are independent of the rows, with significance levelalpha
.double
chiSquareTestDataSetsComparison(long[] observed1, long[] observed2)
Returns the observed significance level, or pvalue, associated with a ChiSquare two sample test comparing bin frequency counts inobserved1
andobserved2
.boolean
chiSquareTestDataSetsComparison(long[] observed1, long[] observed2, double alpha)
Performs a ChiSquare two sample test comparing two binned data sets.



Constructor Detail

ChiSquareTest
public ChiSquareTest()
Construct a ChiSquareTest.


Method Detail

chiSquare
public double chiSquare(double[] expected, long[] observed) throws NotPositiveException, NotStrictlyPositiveException, DimensionMismatchException
Computes the ChiSquare statistic comparingobserved
andexpected
frequency counts.This statistic can be used to perform a ChiSquare test evaluating the null hypothesis that the observed counts follow the expected distribution.
Preconditions:
 Expected counts must all be positive.
 Observed counts must all be ≥ 0.
 The observed and expected arrays must have the same length and their common length must be at least 2.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown.Note: This implementation rescales the
expected
array if necessary to ensure that the sum of the expected and observed counts are equal. Parameters:
observed
 array of observed frequency countsexpected
 array of expected frequency counts Returns:
 chiSquare test statistic
 Throws:
NotPositiveException
 ifobserved
has negative entriesNotStrictlyPositiveException
 ifexpected
has entries that are not strictly positiveDimensionMismatchException
 if the arrays length is less than 2

chiSquareTest
public double chiSquareTest(double[] expected, long[] observed) throws NotPositiveException, NotStrictlyPositiveException, DimensionMismatchException, MaxCountExceededException
Returns the observed significance level, or pvalue, associated with a Chisquare goodness of fit test comparing theobserved
frequency counts to those in theexpected
array.The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts.
Preconditions:
 Expected counts must all be positive.
 Observed counts must all be ≥ 0.
 The observed and expected arrays must have the same length and their common length must be at least 2.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown.Note: This implementation rescales the
expected
array if necessary to ensure that the sum of the expected and observed counts are equal. Parameters:
observed
 array of observed frequency countsexpected
 array of expected frequency counts Returns:
 pvalue
 Throws:
NotPositiveException
 ifobserved
has negative entriesNotStrictlyPositiveException
 ifexpected
has entries that are not strictly positiveDimensionMismatchException
 if the arrays length is less than 2MaxCountExceededException
 if an error occurs computing the pvalue

chiSquareTest
public boolean chiSquareTest(double[] expected, long[] observed, double alpha) throws NotPositiveException, NotStrictlyPositiveException, DimensionMismatchException, OutOfRangeException, MaxCountExceededException
Performs a Chisquare goodness of fit test evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance levelalpha
. Returns true iff the null hypothesis can be rejected with 100 * (1  alpha) percent confidence.Example:
To test the hypothesis thatobserved
followsexpected
at the 99% level, usechiSquareTest(expected, observed, 0.01)
Preconditions:
 Expected counts must all be positive.
 Observed counts must all be ≥ 0.
 The observed and expected arrays must have the same length and their common length must be at least 2.

0 < alpha < 0.5
If any of the preconditions are not met, an
IllegalArgumentException
is thrown.Note: This implementation rescales the
expected
array if necessary to ensure that the sum of the expected and observed counts are equal. Parameters:
observed
 array of observed frequency countsexpected
 array of expected frequency countsalpha
 significance level of the test Returns:
 true iff null hypothesis can be rejected with confidence 1  alpha
 Throws:
NotPositiveException
 ifobserved
has negative entriesNotStrictlyPositiveException
 ifexpected
has entries that are not strictly positiveDimensionMismatchException
 if the arrays length is less than 2OutOfRangeException
 ifalpha
is not in the range (0, 0.5]MaxCountExceededException
 if an error occurs computing the pvalue

chiSquare
public double chiSquare(long[][] counts) throws NullArgumentException, NotPositiveException, DimensionMismatchException
Computes the ChiSquare statistic associated with a chisquare test of independence based on the inputcounts
array, viewed as a twoway table.The rows of the 2way table are
count[0], ... , count[count.length  1]
Preconditions:
 All counts must be ≥ 0.
 The sum of each row and column must be > 0.
 The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
 The 2way table represented by
counts
must have at least 2 columns and at least 2 rows.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown.If a column or row contains only zeros this is invalid input and a
ZeroException
is thrown. The empty column/row should be removed from the input counts. Parameters:
counts
 array representation of 2way table Returns:
 chiSquare test statistic
 Throws:
NullArgumentException
 if the array is nullDimensionMismatchException
 if the array is not rectangularNotPositiveException
 ifcounts
has negative entriesZeroException
 if the sum of a row or column is zero

chiSquareTest
public double chiSquareTest(long[][] counts) throws NullArgumentException, DimensionMismatchException, NotPositiveException, MaxCountExceededException
Returns the observed significance level, or pvalue, associated with a chisquare test of independence based on the inputcounts
array, viewed as a twoway table.The rows of the 2way table are
count[0], ... , count[count.length  1]
Preconditions:
 All counts must be ≥ 0.
 The sum of each row and column must be > 0.
 The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
 The 2way table represented by
counts
must have at least 2 columns and at least 2 rows.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown.If a column or row contains only zeros this is invalid input and a
ZeroException
is thrown. The empty column/row should be removed from the input counts. Parameters:
counts
 array representation of 2way table Returns:
 pvalue
 Throws:
NullArgumentException
 if the array is nullDimensionMismatchException
 if the array is not rectangularNotPositiveException
 ifcounts
has negative entriesMaxCountExceededException
 if an error occurs computing the pvalueZeroException
 if the sum of a row or column is zero

chiSquareTest
public boolean chiSquareTest(long[][] counts, double alpha) throws NullArgumentException, DimensionMismatchException, NotPositiveException, OutOfRangeException, MaxCountExceededException
Performs a chisquare test of independence evaluating the null hypothesis that the classifications represented by the counts in the columns of the input 2way table are independent of the rows, with significance levelalpha
. Returns true iff the null hypothesis can be rejected with 100 * (1  alpha) percent confidence.The rows of the 2way table are
count[0], ... , count[count.length  1]
Example:
To test the null hypothesis that the counts incount[0], ... , count[count.length  1]
all correspond to the same underlying probability distribution at the 99% level, usechiSquareTest(counts, 0.01)
Preconditions:
 All counts must be ≥ 0.
 The sum of each row and column must be > 0.
 The count array must be rectangular (i.e. all count[i] subarrays must have the same length).
 The 2way table represented by
counts
must have at least 2 columns and at least 2 rows.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown.If a column or row contains only zeros this is invalid input and a
ZeroException
is thrown. The empty column/row should be removed from the input counts. Parameters:
counts
 array representation of 2way tablealpha
 significance level of the test Returns:
 true iff null hypothesis can be rejected with confidence 1  alpha
 Throws:
NullArgumentException
 if the array is nullDimensionMismatchException
 if the array is not rectangularNotPositiveException
 ifcounts
has any negative entriesOutOfRangeException
 ifalpha
is not in the range (0, 0.5]MaxCountExceededException
 if an error occurs computing the pvalueZeroException
 if the sum of a row or column is zero

chiSquareDataSetsComparison
public double chiSquareDataSetsComparison(long[] observed1, long[] observed2) throws DimensionMismatchException, NotPositiveException, ZeroException
Computes a ChiSquare two sample test statistic comparing bin frequency counts in
observed1
andobserved2
. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is∑[(K * observed1[i]  observed2[i]/K)^{2} / (observed1[i] + observed2[i])]
whereK = √[∑(observed2 / ∑(observed1)]
This statistic can be used to perform a ChiSquare test evaluating the null hypothesis that both observed counts follow the same distribution.
Preconditions:
 Observed counts must be nonnegative.
 Observed counts for a specific bin must not both be zero.
 Observed counts for a specific sample must not all be 0.
 The arrays
observed1
andobserved2
must have the same length and their common length must be at least 2.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown. Parameters:
observed1
 array of observed frequency counts of the first data setobserved2
 array of observed frequency counts of the second data set Returns:
 chiSquare test statistic
 Throws:
DimensionMismatchException
 the length of the arrays does not matchNotPositiveException
 if any entries inobserved1
orobserved2
are negativeZeroException
 if either all counts ofobserved1
orobserved2
are zero, or if the count at some index is zero for both arrays Since:
 1.2

chiSquareTestDataSetsComparison
public double chiSquareTestDataSetsComparison(long[] observed1, long[] observed2) throws DimensionMismatchException, NotPositiveException, ZeroException, MaxCountExceededException
Returns the observed significance level, or pvalue, associated with a ChiSquare two sample test comparing bin frequency counts in
observed1
andobserved2
.The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.
See
Preconditions:chiSquareDataSetsComparison(long[], long[])
for details on the formula used to compute the test statistic. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays. Observed counts must be nonnegative.
 Observed counts for a specific bin must not both be zero.
 Observed counts for a specific sample must not all be 0.
 The arrays
observed1
andobserved2
must have the same length and their common length must be at least 2.
If any of the preconditions are not met, an
IllegalArgumentException
is thrown. Parameters:
observed1
 array of observed frequency counts of the first data setobserved2
 array of observed frequency counts of the second data set Returns:
 pvalue
 Throws:
DimensionMismatchException
 the length of the arrays does not matchNotPositiveException
 if any entries inobserved1
orobserved2
are negativeZeroException
 if either all counts ofobserved1
orobserved2
are zero, or if the count at the same index is zero for both arraysMaxCountExceededException
 if an error occurs computing the pvalue Since:
 1.2

chiSquareTestDataSetsComparison
public boolean chiSquareTestDataSetsComparison(long[] observed1, long[] observed2, double alpha) throws DimensionMismatchException, NotPositiveException, ZeroException, OutOfRangeException, MaxCountExceededException
Performs a ChiSquare two sample test comparing two binned data sets. The test evaluates the null hypothesis that the two lists of observed counts conform to the same frequency distribution, with significance level
alpha
. Returns true iff the null hypothesis can be rejected with 100 * (1  alpha) percent confidence.See
Preconditions:chiSquareDataSetsComparison(long[], long[])
for details on the formula used to compute the Chisquare statistic used in the test. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays. Observed counts must be nonnegative.
 Observed counts for a specific bin must not both be zero.
 Observed counts for a specific sample must not all be 0.
 The arrays
observed1
andobserved2
must have the same length and their common length must be at least 2. 
0 < alpha < 0.5
If any of the preconditions are not met, an
IllegalArgumentException
is thrown. Parameters:
observed1
 array of observed frequency counts of the first data setobserved2
 array of observed frequency counts of the second data setalpha
 significance level of the test Returns:
 true iff null hypothesis can be rejected with confidence 1  alpha
 Throws:
DimensionMismatchException
 the length of the arrays does not matchNotPositiveException
 if any entries inobserved1
orobserved2
are negativeZeroException
 if either all counts ofobserved1
orobserved2
are zero, or if the count at the same index is zero for both arraysOutOfRangeException
 ifalpha
is not in the range (0, 0.5]MaxCountExceededException
 if an error occurs performing the test Since:
 1.2

