14 Least squares14.1 OverviewThe least squares package fits a parametric model to a set of observed values by minimizing a cost function with a specific form. The fitting basically consists in finding the values for some parameters pk such that a cost function J = sum(wi(targeti - modeli)2) is minimized. The various (targeti - modeli(pk)) terms are called residuals. They represent the deviation between a set of target values targeti and theoretical values computed from models modeli depending on free parameters pk. The wi factors are weights. One classical use case is when the target values are experimental observations or measurements. Two engines devoted to least-squares problems are available. The first one is based on the Gauss-Newton method. The second one is the Levenberg-Marquardt method. 14.2 LeastSquaresBuilder and LeastSquaresFactoryIn order to solve a least-squares fitting problem, the user must provide the following elements:
The elements of the list above can be provided as an implementation of the LeastSquaresProblem interface. However, this is cumbersome to do directly, so some helper classes are available. The first helper is a mutable builder: LeastSquaresBuilder. The second helper is an utility factory: LeastSquaresFactory.
The builder class is better suited when setting the various elements of the least squares
problem is done progressively in different places in the user code. In this case, the user
would create first an empty builder andconfigure it progressively by calling its methods
(
The factory utility is better suited when the various elements of the least squares
problem are all known at one place and the problem can be built in just one sweep, calling
to one of the static 14.3 Model FunctionThe model function is used by the least squares engine to evaluate the model components modeli given some test parameters pk. It is therefore a multivariate function (it depends on the various pk) and it is vector-valued (it has several components modeli). There must be exactly one component modeli for each target (or observed) component targeti, otherwise some residuals to be squared and summed could not be computed. In order for the problem to be well defined, the number of parameters pk must be less than the number of components modeli. Failing to ensure this may lead to the engine throwing an exception as the underlying linear algebra operations may encounter singular matrices. It is not unusual to have a large number of components (several thousands) and only a dozen parameters. There are no limitations on these numbers, though. As the least squares engine needs to create Jacobians matrices for the model function, both its value and its derivatives with respect to the pk parameters must be available. There are two ways to provide this:
The There are no requirements on how to compute value and derivatives. The DerivativeStructure class may be useful to compute analytically derivatives in difficult cases, but this class is not mandated by the API which only expects the derivatives as a Jacobian matrix containing primitive double entries.
One non-obvious feature provided by both the builder and the factory is lazy evaluation. This feature
allows to defer calls to the model functions until they are really needed by the engine. This
can save some calls for engines that evaluate the value and the Jacobians in different loops
(this is the case for Levenberg-Marquardt). However, lazy evaluation is possible only
if the model functions are themselves separated, i.e. it can be used only with the first
alternative above. Setting up the 14.4 Parameters ValidationIn some cases, the model function requires parameters to lie within a specific domain. For example a parameter may be used in a square root and needs to be positive, or another parameter represents the sine of an angle and should be within -1 and +1, or several parameters may need to remain in the unit circle and the sum of their squares must be smaller than 1. The least square solvers available in Apache Commons Math currently don't allow to set up constraints on the parameters. This is a known missing feature. There are two ways to circumvent this. Both ways are achieved by setting up a ParameterValidator instance. The input of the value and jacobian model functions will always be the output of the parameter validator if one exists.
One way to constrain parameters is to use a continuous mapping between the parameters that the
least squares solver will handle and the real parameters of the mathematical model. Using mapping
functions like Another way to constrain parameters is to simply truncate the parameters back to the domain when one search point escapes from it and not care about derivatives. This works only if the solution is expected to be inside the domain and not at the boundary, as points out of the domain will only be temporary test points with a cost function higher than the real solution and will soon be dropped by the underlying engine. As a rule of thumb, these conditions are met only when the domain boundaries correspond to unrealistic values that will never be achieved (null distances, negative masses, ...) but they will not be met when the domain boundaries are more operational limits (a maximum weight that can be handled by a device, a minimum temperature that can be sustained by an instrument, ...). 14.5 TuningAmong the elements to be provided to the least squares problem builder or factory are some tuning parameters for the solver.
The maximum number of iterations refers to the engine algorithm main loop, whereas the
maximum number of iterations refers to the number of calls to the model method. Some
algorithms (like Levenberg-Marquardt) have two embedded loops, with iteration number
being incremented at outer loop level, but a new evaluation being done at each inner
loop. In this case, the number of evaluations will be greater than the number of iterations.
Other algorithms (like Gauss-Newton) have only one level of loops. In this case, the
number of evaluations will equal to the number of iterations. In any case, the maximum
numbers are really only intended as safeguard to prevent infinite loops, so the exact
value of the limit is not important so it is common to select some almost arbitrary number
much larger than the expected number of evaluations and use it for both
Convergence checking is delegated to a dedicated interface from the
14.6 Optimization Engine
Once the least squares problem has been created, using either the builder or the factory,
it is passed to an optimization engine for solving. Two engines devoted to least-squares
problems are available. The first one is
based on the
Gauss-Newton method. The second one is the
Levenberg-Marquardt method. For both increased readability and in order to leverage
possible future changes in the configuration, it is recommended to use the fluent-style API to
build and configure the optimizers. This means creating a first temporary version of the optimizer
with a default parameterless constructor, and then to set up the various configuration parameters
using the available LeastSquaresOptimizer optimizer = new LevenbergMarquardtOptimizer(). withCostRelativeTolerance(1.0e-12). withParameterRelativeTolerance(1.0e-12); As another example, setting up a Gauss-Newton optimizer and forcing the decomposition to SVD (the default is QR decomposition) would be done as follows: LeastSquaresOptimizer optimizer = new GaussNewtonOptimizer(). withwithDecomposition(GaussNewtonOptimizer.Decomposition.QR); 14.7 Solving
Solving the least squares problem is done by calling the LeastSquaresOptimizer.Optimum optimum = optimizer.optimize(leastSquaresProblem); The LeastSquaresOptimizer.Optimum class is a specialized Evaluation with additional methods te retrieve the number of evaluations and number of iterations performed. The most important methods are inherited from the Evaluation class and correspond to the point (i.e. the parameters), cost, Jacobian, RMS, covariance ... 14.8 ExampleThe following simple example shows how to find the center of a circle of known radius to to best fit observed 2D points. It is a simplified version of one of the JUnit test cases. In the complete test case, both the circle center and its radius are fitted, here the radius is fixed. final double radius = 70.0; final Cartesian2D[] observedPoints = new Cartesian2D[] { new Cartesian2D( 30.0, 68.0), new Cartesian2D( 50.0, -6.0), new Cartesian2D(110.0, -20.0), new Cartesian2D( 35.0, 15.0), new Cartesian2D( 45.0, 97.0) }; // the model function components are the distances to current estimated center, // they should be as close as possible to the specified radius MultivariateJacobianFunction distancesToCurrentCenter = new MultivariateJacobianFunction() { public Pair<RealVector, RealMatrix> value(final RealVector point) { Cartesian2D center = new Cartesian2D(point.getEntry(0), point.getEntry(1)); RealVector value = new ArrayRealVector(observedPoints.length); RealMatrix jacobian = new Array2DRowRealMatrix(observedPoints.length, 2); for (int i = 0; i < observedPoints.length; ++i) { Cartesian2D o = observedPoints[i]; double modelI = Cartesian2D.distance(o, center); value.setEntry(i, modelI); // derivative with respect to p0 = x center jacobian.setEntry(i, 0, (center.getX() - o.getX()) / modelI); // derivative with respect to p1 = y center jacobian.setEntry(i, 1, (center.getX() - o.getX()) / modelI); } return new Pair<RealVector, RealMatrix>(value, jacobian); } }; // the target is to have all points at the specified radius from the center double[] prescribedDistances = new double[observedPoints.length]; Arrays.fill(prescribedDistances, radius); // least squares problem to solve : modeled radius should be close to target radius LeastSquaresProblem problem = new LeastSquaresBuilder(). start(new double[] { 100.0, 50.0 }). model(distancesToCurrentCenter). target(prescribedDistances). lazyEvaluation(false). maxEvaluations(1000). maxIterations(1000). build(); LeastSquaresOptimizer.Optimum optimum = new LevenbergMarquardtOptimizer().optimize(problem); Cartesian2D fittedCenter = new Cartesian2D(optimum.getPoint().getEntry(0), optimum.getPoint().getEntry(1)); System.out.println("fitted center: " + fittedCenter.getX() + " " + fittedCenter.getY()); System.out.println("RMS: " + optimum.getRMS()); System.out.println("evaluations: " + optimum.getEvaluations()); System.out.println("iterations: " + optimum.getIterations()); |