Nabla computes the derivatives by applying the classical
differentiation rules at bytecode level. When an instance
of a class implementing UnivariateFunction
is passed to its differentiate
method, Nabla
tracks the mathematical operations flow that leads from the
t
parameter to the return value of the
function. At the bytecode instructions level, all operations
are elementary ones. Each elementary operation is then
changed to compute both the value and the derivatives. Nothing is
changed to the control flow instructions (loops, branches,
operations scheduling).
Analysis and transformation of the bytecode is realized using both
the core API and the tree API from the asm
bytecode manipulation and analysis framework. The entry point of this differentiation
process is the
differentiate method of the MethodDifferentiator
which is called from the
ForwardModeDifferentiator class for processing the f
method of the user class.
All the changed operations belong to a small subset of the
virtual machine instructions. This set contains basic arithmetic
operations (addition, subtraction ...), conversion operations
(double to int, long to double ...), storage instructions (local
variables, functions parameters, instance or class fields ...)
and calls to elementary functions defined in the Math
,
StrictMath
, FastMath
or similar classes.
There is really nothing more! For each one of these basic bytecode
instructions, Nabla knows how to map them to a mathematical equation and
how to hand these equations to a class that will compute derivatives.
Lets consider the DADD
bytecode instruction and consider
only first derivative for now. This instruction corresponds
to the addition of two real numbers and produces a third number
which is their sum. Nabla maps the instruction to the equation:
c=a+b
and calls the
DerivativeStructure class provided by Apache Commons Math to
compute both the value and the first derivative:
(c=a+b, c'=a'+b')
In this example, the DerivativeStructure
class uses only
the linearity property of differentiation which implies that the
derivative of a sum is the sum of the derivatives. Similar rules exist
for all arithmetic instructions. The derivative of all basic functions
in the Math
, StrictMath
and FastMath
are known. The rules are also known for any derivation order, they are
not limited to first order. In fact, Nabla itself does not know any
of these rules, all computations are delegated to
DerivativeStructure.
Lets consider a more extensive example:
public class Linear implements UnivariateFunction { public double value(double t) { double result = 1; for (int i = 0; i < 3; ++i) { result = result * (2 * t + 1); } return result; } }
In this example, the only things that need to be changed for
differentiating the value
method are
the t
parameter and the result
local variable which must be adapted to hold the derivative
structure, the two multiplications and the addition. In
order to generate the new function, Nabla will convert these
elements one at a time, starting from the parameter change
and propagating the change using a simple data flow
analysis. There is no need to analyze the global structure
of the code, and no need to change anything in the loop
handling instructions above. The method generated for the
example above will be roughly similar to the one that would
result from compiling the code below (except that Nabla
generates bytecode directly, it does not use source at all):
public DerivativeStructure value(DerivativeStructure t) { // source roughly equivalent to conversion of the result initialization DerivativeStructure result = new DerivativeStructure(t.getFreeParameters(), t.getOrder(), 1.0); // this loop handling code is not changed at all for (int i = 0; i < 3; ++i) { // source roughly equivalent to conversion of "2 * ..." DerivativeStructure tmpA = t.multiply(2.0); // source roughly equivalent to conversion of "... + 1" DerivativeStructure tmpB = tmpA.add(1.0); // source roughly equivalent to conversion of "result * ..." DerivativeStructure tmpC = result.multiply(tmpB); // source roughly equivalent to conversion of "result = ..." result = tmpC; } // source equivalent to code generated at method exit return result; }
This example shows that the instructions conversions have local scope.
Another thing that is shown in the previous example is that
DerivativeStructure
instances appear everywhere
a double that depends on the input parameter appears, like the
result variable and all the temporary variables. However,
double that do not depend on the input parameter like the
2.0 and 1.0 literal constants remain primitive double values.
The code above also shows that the generated code does not depend
on the derivation order or number or free parameters. In fact, this
information is only carried at runtime by the DerivativeStructure
instance provided as an unput parameters, and the intermediate instances
created on the fly will automatically share these values (see the construction
of the result
variable and the call to getFreeParameters
and getOrder
).
As shown in the example above, the original method bytecode
contains both immutable parts that must be preserved and
parts belonging to what we will call the computation path
from the t
parameter to the result that must
be differentiated. The instructions that belong to the computation
path and hence that must be converted are identified by a data
flow analysis seeded with the t
parameter.
The first step of this data flow analysis is to link each data element (either stack cell or local variable, as explained in the virtual machine execution model section) with the instructions that may produce it and the instructions that may consume it. This task is realized by the TrackingInterpreter and TrackingValue classes.
As explained in the double
converted to DerivativeStructure section of the usage documentation,
the signature of the value
method is changed. The primitive
double
t
parameter in the original method is changed during
the differentiation process to a DerivativeStructure
instance in the generated derivative.
All instructions that used the original primitive double
parameter must be changed to cope with the new DerivativeStructure
local variables. In order to do this, the representation of the t
parameter in the bytecode is marked as pending conversion from one primitive
double
to a DerivativeStructure
. Once this data element has
been marked, the data flow will propagate the mark to other data
elements (both variables and operand stack cells) thanks to the following
rules:
double
data element produced by a changed
instruction must be marked as pending conversion from one primitive
double
to a DerivativeStructure
What is the validity of this approach?
For straightforward smooth functions, the expanded code really computes both the value of the equation and its exact derivatives. This is a simple application of the differentiation rules. So the accuracy of the derivative will be in par with the accuracy of the initial function. If the initial function is a good model of a physical process, the derivative will be a good evaluation of its evolution. If the initial function is only an approximation of a real physical model, then the derivative will be an approximation too, but an approximation that is consistent with the initial function up to computer accuracy.
If the initial function is not smooth, the singular points must be analyzed specially. Some design choices are involved which have an impact on validity. These choices have been made in such a way that in some sense, the result is still as valid, as accurate and as consistent with the initial function as for smooth functions. This point is explained in details in the section about singularities.