Nabla computes the derivatives applying the classical
differentiation rules at bytecode level. When an instance
of a class implementing UnivariateDifferentiable
is passed to its differentiate method, Nabla
tracks the mathematical operations flow that leads from the
t parameter to the return value of the
function. At the bytecode instructions level, the operations
are elementary ones. Each elementary operation is then
changed to compute both a value and a derivative. Nothing is
changed to the control flow instructions (loops, branches,
operations scheduling).
All the changed operations belong to a small subset of the
virtual machine instructions. This set contains basic arithmetic
operations (addition, subtraction ...), conversion operations
(double to int, long to double ...), storage instructions (local
variables, functions parameters, instance or class fields ...)
and calls to elementary functions defined in the Math
and StrictMath classes. There is really nothing more!
For each one of these basic bytecode instructions, we know how to
map it to a mathematical equation and we can combine this equation
with its derivative to form a pair of equations we will use later.
For example, a DADD bytecode instruction corresponds
to the addition of two real numbers and produces a third number
which is their sum. So we map the instruction to the equation:
c=a+b
and we combine this with its derivative to form the pair:
(c=a+b, c'=a'+b')
In this example, we have simply used the linearity property of
differentiation which implies that the derivative of a sum is the
sum of the derivatives. Similar rules exist for all arithmetic
instructions, and the derivative of all basic functions in the
Math and StrictMath is known. The
complete rules set is described in the Differentiation rules
section
below.
Lets consider a more extensive example:
public class Linear implements UnivariateDifferentiable {
public double f(double t) {
double result = 1;
for (int i = 0; i < 3; ++i) {
result = result * (2 * t + 1);
}
return result;
}
}
In this example, the only things that need to be changed for
differentiating the f method are
the t parameter and the result
local variable which must be adapted to hold the value and
the derivative, the two multiplications and the addition. In
order to generate the new function, Nabla will convert these
elements one at a time, starting from the parameter change
and propagating the change using a simple data flow
analysis. There is no need to analyze the global structure
of the code, and no need to change anything in the loop
handling instructions above. The method generated for the
example above will be roughly similar to the one that would
result from compiling the code below (except that Nabla
generates bytecode directly, it does not use source at all):
public DifferentialPair f(DifferentialPair t) {
// source roughly equivalent to code generated at method entry
double t0 = t.getValue();
double t1 = t.getFirstDerivative();
// source roughly equivalent to conversion of the result affectation
double result0 = 1;
double result1 = 0;
// this loop handling code is not changed at all
for (int i = 0; i < 3; ++i) {
// source roughly equivalent to conversion of "2 * ..."
double tmpA0 = 2 * t0;
double tmpA1 = 2 * t1;
// source roughly equivalent to conversion of "... + 1"
tmpA0 += 1;
// source roughly equivalent to conversion of "result * ..."
double tmpB0 = result0 * tmpA0;
double tmpB1 = result0 * tmpA1 + result1 * tmpA0;
// source roughly equivalent to conversion of "result = ..."
result0 = tmpB0;
result1 = tmpB1;
}
// source equivalent to code generated at method exit
return new DifferentialPair(result0, result1);
}
This example shows that the instructions conversions have local scope. No global tree representation of the method is needed at all.
Another thing that is shown in the previous example is that
DifferentialPair instances appear only at method
entry and exit. They are not used internally for elementary
conversions, only pairs of local variables (or operand stack
cells as we will see later) are used.
The only methods from the DifferentialPair class that
are used by the automatic differentiator are the two arguments
constructor, the getValue method and the
getFirstDerivative method.
For differentiable functions that call other differentiable functions,
additional instances of DifferentialPair will be used.
Some will be built in the caller to pass parameters to the callee, and
others will be build in the callee to return results to the caller.
This is not the case for calls to the elementary mathematical
functions defined in the Math and StrictMath
classes. For the known functions the derivative computation are inlined.
For example a call to the Math.cos function will be inlined
as a call to Math.cos, a call to Math.sin
and some intermediate arithmetic operations.
As shown in the example above, the original method bytecode
contains both immutable parts that must be preserved and
parts belonging to what we will call the computation path
from the t parameter to the result that must
be differentiated. The instructions that must be converted are
identified by a data flow analysis seeded with the t
parameter, which is changed from a primitive double
in the original method to a DifferentialPair in the
generated derivative, as explained in the Differential pairs
section
of the usage documentation.
TODO: explain analysis by "tainted" data and types propagation.
What is the validity of this approach?
For straightforward smooth functions, the expanded code really computes both the value of the equation and its exact derivative. This is a simple application of the differentiation rules. So the accuracy of the derivative will be in par with the accuracy of the initial function. If the initial function is a good model of a physical process, the derivative will be a good evaluation of its evolution. If the initial function is only an approximation of a real physical model, then the derivative will be an approximation too, but an approximation that is consistent with the initial function up to computer accuracy.
If the initial function is not smooth, the singular points must be analyzed specially. Some design choices are involved which have an impact on validity. These choices have been made in such a way that in some sense, the result is still as valid, as accurate and as consistent with the initial function as for smooth functions. This point is explained in details in the section about singularities .