Class SimpleRegression
 java.lang.Object

 org.apache.commons.math4.legacy.stat.regression.SimpleRegression

 All Implemented Interfaces:
UpdatingMultipleLinearRegression
public class SimpleRegression extends Object implements UpdatingMultipleLinearRegression
Estimates an ordinary least squares regression model with one independent variable.y = intercept + slope * x
Standard errors for
intercept
andslope
are available as well as ANOVA, rsquare and Pearson's r statistics.Observations (x,y pairs) can be added to the model one at a time or they can be provided in a 2dimensional array. The observations are not stored in memory, so there is no limit to the number of observations that can be added to the model.
Usage Notes:
 When there are fewer than two observations in the model, or when
there is no variation in the x values (i.e. all x values are the same)
all statistics return
NaN
. At least two observations with different x coordinates are required to estimate a bivariate regression model.  Getters for the statistics always compute values based on the current set of observations  i.e., you can get statistics, then add more data and get updated statistics without using a new instance. There is no "compute" method that updates all statistics. Each of the getters performs the necessary computations to return the requested statistic.
 The intercept term may be suppressed by passing
false
to theSimpleRegression(boolean)
constructor. When thehasIntercept
property is false, the model is estimated without a constant term andgetIntercept()
returns0
.


Constructor Summary
Constructors Constructor Description SimpleRegression()
Create an empty SimpleRegression instance.SimpleRegression(boolean includeIntercept)
Create a SimpleRegression instance, specifying whether or not to estimate an intercept.

Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addData(double[][] data)
Adds the observations represented by the elements indata
.void
addData(double x, double y)
Adds the observation (x,y) to the regression data set.void
addObservation(double[] x, double y)
Adds one observation to the regression model.void
addObservations(double[][] x, double[] y)
Adds a series of observations to the regression model.void
append(SimpleRegression reg)
Appends data from another regression calculation to this one.void
clear()
Clears all data from the model.double
getIntercept()
Returns the intercept of the estimated regression line, ifhasIntercept()
is true; otherwise 0.double
getInterceptStdErr()
Returns the standard error of the intercept estimate, usually denoted s(b0).double
getMeanSquareError()
Returns the sum of squared errors divided by the degrees of freedom, usually abbreviated MSE.long
getN()
Returns the number of observations that have been added to the model.double
getR()
Returns Pearson's product moment correlation coefficient, usually denoted r.double
getRegressionSumSquares()
Returns the sum of squared deviations of the predicted y values about their mean (which equals the mean of y).double
getRSquare()
Returns the coefficient of determination, usually denoted rsquare.double
getSignificance()
Returns the significance level of the slope (equiv) correlation.double
getSlope()
Returns the slope of the estimated regression line.double
getSlopeConfidenceInterval()
Returns the halfwidth of a 95% confidence interval for the slope estimate.double
getSlopeConfidenceInterval(double alpha)
Returns the halfwidth of a (100100*alpha)% confidence interval for the slope estimate.double
getSlopeStdErr()
Returns the standard error of the slope estimate, usually denoted s(b1).double
getSumOfCrossProducts()
Returns the sum of crossproducts, x_{i}*y_{i}.double
getSumSquaredErrors()
Returns the sum of squared errors (SSE) associated with the regression model.double
getTotalSumSquares()
Returns the sum of squared deviations of the y values about their mean.double
getXSumSquares()
Returns the sum of squared deviations of the x values about their mean.boolean
hasIntercept()
Returns true if the model includes an intercept term.double
predict(double x)
Returns the "predicted"y
value associated with the suppliedx
value, based on the data that has been added to the model when this method is activated.RegressionResults
regress()
Performs a regression on data present in buffers and outputs a RegressionResults object.RegressionResults
regress(int[] variablesToInclude)
Performs a regression on data present in buffers including only regressors.void
removeData(double[][] data)
Removes observations represented by the elements indata
.void
removeData(double x, double y)
Removes the observation (x,y) from the regression data set.



Constructor Detail

SimpleRegression
public SimpleRegression()
Create an empty SimpleRegression instance.

SimpleRegression
public SimpleRegression(boolean includeIntercept)
Create a SimpleRegression instance, specifying whether or not to estimate an intercept.Use
false
to estimate a model with no intercept. When thehasIntercept
property is false, the model is estimated without a constant term andgetIntercept()
returns0
. Parameters:
includeIntercept
 whether or not to include an intercept term in the regression model


Method Detail

addData
public void addData(double x, double y)
Adds the observation (x,y) to the regression data set.Uses updating formulas for means and sums of squares defined in "Algorithms for Computing the Sample Variance: Analysis and Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. 1983, American Statistician, vol. 37, pp. 242247, referenced in Weisberg, S. "Applied Linear Regression". 2nd Ed. 1985.
 Parameters:
x
 independent variable valuey
 dependent variable value

append
public void append(SimpleRegression reg)
Appends data from another regression calculation to this one.The mean update formulae are based on a paper written by Philippe Pébay: Formulas for Robust, OnePass Parallel Computation of Covariances and ArbitraryOrder Statistical Moments, 2008, Technical Report SAND20086212, Sandia National Laboratories.
 Parameters:
reg
 model to append data from Since:
 3.3

removeData
public void removeData(double x, double y)
Removes the observation (x,y) from the regression data set.Mirrors the addData method. This method permits the use of SimpleRegression instances in streaming mode where the regression is applied to a sliding "window" of observations, however the caller is responsible for maintaining the set of observations in the window.
The method has no effect if there are no points of data (i.e. n=0) Parameters:
x
 independent variable valuey
 dependent variable value

addData
public void addData(double[][] data) throws ModelSpecificationException
Adds the observations represented by the elements indata
.(data[0][0],data[0][1])
will be the first observation, then(data[1][0],data[1][1])
, etc.This method does not replace data that has already been added. The observations represented by
data
are added to the existing dataset.To replace all data, use
clear()
before adding the new data. Parameters:
data
 array of observations to be added Throws:
ModelSpecificationException
 if the length ofdata[i]
is not greater than or equal to 2

addObservation
public void addObservation(double[] x, double y) throws ModelSpecificationException
Adds one observation to the regression model. Specified by:
addObservation
in interfaceUpdatingMultipleLinearRegression
 Parameters:
x
 the independent variables which form the design matrixy
 the dependent or response variable Throws:
ModelSpecificationException
 if the length ofx
does not equal the number of independent variables in the model

addObservations
public void addObservations(double[][] x, double[] y) throws ModelSpecificationException
Adds a series of observations to the regression model. The lengths of x and y must be the same and x must be rectangular. Specified by:
addObservations
in interfaceUpdatingMultipleLinearRegression
 Parameters:
x
 a series of observations on the independent variablesy
 a series of observations on the dependent variable The length of x and y must be the same Throws:
ModelSpecificationException
 ifx
is not rectangular, does not match the length ofy
or does not contain sufficient data to estimate the model

removeData
public void removeData(double[][] data)
Removes observations represented by the elements indata
.If the array is larger than the current n, only the first n elements are processed. This method permits the use of SimpleRegression instances in streaming mode where the regression is applied to a sliding "window" of observations, however the caller is responsible for maintaining the set of observations in the window.
To remove all data, use
clear()
. Parameters:
data
 array of observations to be removed

clear
public void clear()
Clears all data from the model. Specified by:
clear
in interfaceUpdatingMultipleLinearRegression

getN
public long getN()
Returns the number of observations that have been added to the model. Specified by:
getN
in interfaceUpdatingMultipleLinearRegression
 Returns:
 n number of observations that have been added.

predict
public double predict(double x)
Returns the "predicted"y
value associated with the suppliedx
value, based on the data that has been added to the model when this method is activated.predict(x) = intercept + slope * x
Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN
is returned.
 Parameters:
x
 inputx
value Returns:
 predicted
y
value
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

getIntercept
public double getIntercept()
Returns the intercept of the estimated regression line, ifhasIntercept()
is true; otherwise 0.The least squares estimate of the intercept is computed using the normal equations. The intercept is sometimes denoted b0.
Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN
is returned.
 Returns:
 the intercept of the regression line if the model includes an intercept; 0 otherwise
 See Also:
SimpleRegression(boolean)
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

hasIntercept
public boolean hasIntercept()
Returns true if the model includes an intercept term. Specified by:
hasIntercept
in interfaceUpdatingMultipleLinearRegression
 Returns:
 true if the regression includes an intercept; false otherwise
 See Also:
SimpleRegression(boolean)

getSlope
public double getSlope()
Returns the slope of the estimated regression line.The least squares estimate of the slope is computed using the normal equations. The slope is sometimes denoted b1.
Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double.NaN
is returned.
 Returns:
 the slope of the regression line
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

getSumSquaredErrors
public double getSumSquaredErrors()
Returns the sum of squared errors (SSE) associated with the regression model.The sum is computed using the computational formula
SSE = SYY  (SXY * SXY / SXX)
where
SYY
is the sum of the squared deviations of the y values about their mean,SXX
is similarly defined andSXY
is the sum of the products of x and y mean deviations.The sums are accumulated using the updating algorithm referenced in
addData(double, double)
.The return value is constrained to be nonnegative  i.e., if due to rounding errors the computational formula returns a negative result, 0 is returned.
Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN
is returned.
 Returns:
 sum of squared errors associated with the regression model
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

getTotalSumSquares
public double getTotalSumSquares()
Returns the sum of squared deviations of the y values about their mean.This is defined as SSTO here.
If
n < 2
, this returnsDouble.NaN
. Returns:
 sum of squared deviations of y values

getXSumSquares
public double getXSumSquares()
Returns the sum of squared deviations of the x values about their mean. Ifn < 2
, this returnsDouble.NaN
. Returns:
 sum of squared deviations of x values

getSumOfCrossProducts
public double getSumOfCrossProducts()
Returns the sum of crossproducts, x_{i}*y_{i}. Returns:
 sum of cross products

getRegressionSumSquares
public double getRegressionSumSquares()
Returns the sum of squared deviations of the predicted y values about their mean (which equals the mean of y).This is usually abbreviated SSR or SSM. It is defined as SSM here
Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double.NaN
is returned.
 Returns:
 sum of squared deviations of predicted y values
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

getMeanSquareError
public double getMeanSquareError()
Returns the sum of squared errors divided by the degrees of freedom, usually abbreviated MSE.If there are fewer than three data pairs in the model, or if there is no variation in
x
, this returnsDouble.NaN
. Returns:
 sum of squared deviations of y values

getR
public double getR()
Returns Pearson's product moment correlation coefficient, usually denoted r.Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN
is returned.
 Returns:
 Pearson's r
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

getRSquare
public double getRSquare()
Returns the coefficient of determination, usually denoted rsquare.Preconditions:
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,
Double,NaN
is returned.
 Returns:
 rsquare
 At least two observations (with at least two different x values)
must have been added before invoking this method. If this method is
invoked before a model can be estimated,

getInterceptStdErr
public double getInterceptStdErr()
Returns the standard error of the intercept estimate, usually denoted s(b0).If there are fewer that three observations in the model, or if there is no variation in x, this returns
Additionally, aDouble.NaN
.Double.NaN
is returned when the intercept is constrained to be zero Returns:
 standard error associated with intercept estimate

getSlopeStdErr
public double getSlopeStdErr()
Returns the standard error of the slope estimate, usually denoted s(b1).If there are fewer that three data pairs in the model, or if there is no variation in x, this returns
Double.NaN
. Returns:
 standard error associated with slope estimate

getSlopeConfidenceInterval
public double getSlopeConfidenceInterval() throws OutOfRangeException
Returns the halfwidth of a 95% confidence interval for the slope estimate.The 95% confidence interval is
(getSlope()  getSlopeConfidenceInterval(), getSlope() + getSlopeConfidenceInterval())
If there are fewer that three observations in the model, or if there is no variation in x, this returns
Double.NaN
.Usage Note:
The validity of this statistic depends on the assumption that the observations included in the model are drawn from a Bivariate Normal Distribution. Returns:
 halfwidth of 95% confidence interval for the slope estimate
 Throws:
OutOfRangeException
 if the confidence interval can not be computed.

getSlopeConfidenceInterval
public double getSlopeConfidenceInterval(double alpha) throws OutOfRangeException
Returns the halfwidth of a (100100*alpha)% confidence interval for the slope estimate.The (100100*alpha)% confidence interval is
(getSlope()  getSlopeConfidenceInterval(), getSlope() + getSlopeConfidenceInterval())
To request, for example, a 99% confidence interval, use
alpha = .01
Usage Note:
The validity of this statistic depends on the assumption that the observations included in the model are drawn from a Bivariate Normal Distribution.Preconditions:
 If there are fewer that three observations in the
model, or if there is no variation in x, this returns
Double.NaN
. (0 < alpha < 1)
; otherwise anOutOfRangeException
is thrown.
 Parameters:
alpha
 the desired significance level Returns:
 halfwidth of 95% confidence interval for the slope estimate
 Throws:
OutOfRangeException
 if the confidence interval can not be computed.
 If there are fewer that three observations in the
model, or if there is no variation in x, this returns

getSignificance
public double getSignificance()
Returns the significance level of the slope (equiv) correlation.Specifically, the returned value is the smallest
alpha
such that the slope confidence interval with significance level equal toalpha
does not include0
. On regression output, this is often denotedProb(t > 0)
Usage Note:
The validity of this statistic depends on the assumption that the observations included in the model are drawn from a Bivariate Normal Distribution.If there are fewer that three observations in the model, or if there is no variation in x, this returns
Double.NaN
. Returns:
 significance level for slope/correlation
 Throws:
MaxCountExceededException
 if the significance level can not be computed.

regress
public RegressionResults regress() throws ModelSpecificationException, NoDataException
Performs a regression on data present in buffers and outputs a RegressionResults object.If there are fewer than 3 observations in the model and
hasIntercept
is true aNoDataException
is thrown. If there is no intercept term, the model must contain at least 2 observations. Specified by:
regress
in interfaceUpdatingMultipleLinearRegression
 Returns:
 RegressionResults acts as a container of regression output
 Throws:
ModelSpecificationException
 if the model is not correctly specifiedNoDataException
 if there is not sufficient data in the model to estimate the regression parameters

regress
public RegressionResults regress(int[] variablesToInclude) throws MathIllegalArgumentException
Performs a regression on data present in buffers including only regressors. indexed in variablesToInclude and outputs a RegressionResults object Specified by:
regress
in interfaceUpdatingMultipleLinearRegression
 Parameters:
variablesToInclude
 an array of indices of regressors to include Returns:
 RegressionResults acts as a container of regression output
 Throws:
MathIllegalArgumentException
 if the variablesToInclude array is null or zero lengthOutOfRangeException
 if a requested variable is not present in model

