Nabla computes the derivatives applying the classical
differentiation rules at bytecode level. When an instance
of a class implementing UnivariateDifferentiable
is passed to its differentiate method, Nabla
tracks the mathematical operations flow that leads from the
t parameter to the return value of the
function. At the bytecode instructions level, the operations
are elementary ones. Each elementary operation is then
changed to compute both a value and a derivative. Nothing is
changed to the control flow instructions (loops, branches,
operations scheduling). The entry point of this differentiation
process is the
visitEnd
method of the MethodDifferentiator
which is called from the
ForwardAlgorithmicDifferentiator
class for processing the f
method of the user class.
All the changed operations belong to a small subset of the
virtual machine instructions. This set contains basic arithmetic
operations (addition, subtraction ...), conversion operations
(double to int, long to double ...), storage instructions (local
variables, functions parameters, instance or class fields ...)
and calls to elementary functions defined in the Math
and StrictMath classes. There is really nothing more!
For each one of these basic bytecode instructions, we know how to
map it to a mathematical equation and we can combine this equation
with its derivative to form a pair of equations we will use later.
For example, a DADD bytecode instruction corresponds
to the addition of two real numbers and produces a third number
which is their sum. So we map the instruction to the equation:
c=a+b
and we combine this with its derivative to form the pair:
(c=a+b, c'=a'+b')
In this example, we have simply used the linearity property of
differentiation which implies that the derivative of a sum is the
sum of the derivatives. Similar rules exist for all arithmetic
instructions, and the derivative of all basic functions in the
Math and StrictMath is known. The
complete rules set is described in the Differentiation rules
section
below.
Lets consider a more extensive example:
public class Linear implements UnivariateDifferentiable {
public double f(double t) {
double result = 1;
for (int i = 0; i < 3; ++i) {
result = result * (2 * t + 1);
}
return result;
}
}
In this example, the only things that need to be changed for
differentiating the f method are
the t parameter and the result
local variable which must be adapted to hold the value and
the derivative, the two multiplications and the addition. In
order to generate the new function, Nabla will convert these
elements one at a time, starting from the parameter change
and propagating the change using a simple data flow
analysis. There is no need to analyze the global structure
of the code, and no need to change anything in the loop
handling instructions above. The method generated for the
example above will be roughly similar to the one that would
result from compiling the code below (except that Nabla
generates bytecode directly, it does not use source at all):
public DifferentialPair f(DifferentialPair t) {
// source roughly equivalent to code generated at method entry
double t0 = t.getValue();
double t1 = t.getFirstDerivative();
// source roughly equivalent to conversion of the result affectation
double result0 = 1;
double result1 = 0;
// this loop handling code is not changed at all
for (int i = 0; i < 3; ++i) {
// source roughly equivalent to conversion of "2 * ..."
double tmpA0 = 2 * t0;
double tmpA1 = 2 * t1;
// source roughly equivalent to conversion of "... + 1"
tmpA0 += 1;
// source roughly equivalent to conversion of "result * ..."
double tmpB0 = result0 * tmpA0;
double tmpB1 = result0 * tmpA1 + result1 * tmpA0;
// source roughly equivalent to conversion of "result = ..."
result0 = tmpB0;
result1 = tmpB1;
}
// source equivalent to code generated at method exit
return new DifferentialPair(result0, result1);
}
This example shows that the instructions conversions have local scope. No global tree representation of the method is needed at all.
Another thing that is shown in the previous example is that
DifferentialPair instances appear only at method
entry and exit. They are not used internally for elementary
conversions, only pairs of local variables (or operand stack
cells as we will see later) are used.
For differentiable functions that call other differentiable functions,
additional instances of DifferentialPair will be used.
Some will be built in the caller to pass parameters to the callee, and
others will be build in the callee to return results to the caller.
This is not the case for calls to the elementary mathematical
functions defined in the Math and StrictMath
classes. For the known functions the derivative computation are inlined.
For example a call to the Math.cos function will be inlined
as a call to Math.cos, a call to Math.sin
and some intermediate arithmetic operations.
The only methods from the DifferentialPair class that
are used by the algorithmic differentiator are the two arguments
constructor, the getValue method and the
getFirstDerivative method. As far as algorithmic
differentiation is concerned, all the other methods could be removed
from the class. In fact, they are only present for user convenience
if they wish to perform additional computation after a function has
been differentiated.
As shown in the example above, the original method bytecode
contains both immutable parts that must be preserved and
parts belonging to what we will call the computation path
from the t parameter to the result that must
be differentiated. The instructions that belong to the computation
path and hence that must be converted are identified by a data
flow analysis seeded with the t parameter.
The first step of this data flow analysis is to link each data element (either stack cell or local variable, as explained in the virtual machine execution model section) with the instructions that may produce it and the instructions that may consume it. This task is realized by the TrackingInterpreter and TrackingValue classes.
As explained in the Differential
pairs
section of the usage documentation, the signature of
the f method is changed. The primitive doublet parameter in the original method is changed during
the differentiation process to a DifferentialPair
instance in the generated derivative. This instance is itself
expanded right from the start of the method into a pair of primitive
double local variables that can be used throughout the
method code.
All instructions that used the original primitive double
parameter must be changed to cope with the new pair of primitive
double local variables. In order to do this, the
representation of the t parameter in the bytecode is
marked as pending conversion from one primitive double
to a pair of primitive doubles. Once this data element has
been marked, the data flow will propagate the mark to other data
elements (both variables and operand stack cells) thanks to the following
rules:
double data element produced by a changed
instruction must be marked as pending conversion from one primitive
double to a pair of primitive doublesWhat is the validity of this approach?
For straightforward smooth functions, the expanded code really computes both the value of the equation and its exact derivative. This is a simple application of the differentiation rules. So the accuracy of the derivative will be in par with the accuracy of the initial function. If the initial function is a good model of a physical process, the derivative will be a good evaluation of its evolution. If the initial function is only an approximation of a real physical model, then the derivative will be an approximation too, but an approximation that is consistent with the initial function up to computer accuracy.
If the initial function is not smooth, the singular points must be analyzed specially. Some design choices are involved which have an impact on validity. These choices have been made in such a way that in some sense, the result is still as valid, as accurate and as consistent with the initial function as for smooth functions. This point is explained in details in the section about singularities .