Principles

Differentiation at bytecode instructions level

Nabla computes the derivatives applying the classical differentiation rules at bytecode level. When an instance of a class implementing UnivariateDifferentiable is passed to its differentiate method, Nabla tracks the mathematical operations flow that leads from the t parameter to the return value of the function. At the bytecode instructions level, the operations are elementary ones. Each elementary operation is then changed to compute both a value and a derivative. Nothing is changed to the control flow instructions (loops, branches, operations scheduling).

All the changed operations belong to a small subset of the virtual machine instructions. This set contains basic arithmetic operations (addition, subtraction ...), conversion operations (double to int, long to double ...), storage instructions (local variables, functions parameters, instance or class fields ...) and calls to elementary functions defined in the Math and StrictMath classes. There is really nothing more! For each one of these basic bytecode instructions, we know how to map it to a mathematical equation and we can combine this equation with its derivative to form a pair of equations we will use later. For example, a DADD bytecode instruction corresponds to the addition of two real numbers and produces a third number which is their sum. So we map the instruction to the equation:

c=a+b
and we combine this with its derivative to form the pair:
(c=a+b, c'=a'+b')
In this example, we have simply used the linearity property of differentiation which implies that the derivative of a sum is the sum of the derivatives. Similar rules exist for all arithmetic instructions, and the derivative of all basic functions in the Math and StrictMath is known. The complete rules set is described in the Differentiation rules section below.

Lets consider a more extensive example:

          public class Linear implements UnivariateDifferentiable {
              public double f(double t) {
                  double result = 1;
                  for (int i = 0; i < 3; ++i) {
                      result = result * (2 * t + 1);
                  }
                  return result;
              }
          }
        

In this example, the only things that need to be changed for differentiating the f method are the t parameter and the result local variable which must be adapted to hold the value and the derivative, the two multiplications and the addition. In order to generate the new function, Nabla will convert these elements one at a time, starting from the parameter change and propagating the change using a simple data flow analysis. There is no need to analyze the global structure of the code, and no need to change anything in the loop handling instructions above. The method generated for the example above will be roughly similar to the one that would result from compiling the code below (except that Nabla generates bytecode directly, it does not use source at all):

          public DifferentialPair f(DifferentialPair t) {

              // source roughly equivalent to code generated at method entry
              double t0 = t.getValue();
              double t1 = t.getFirstDerivative();

              // source roughly equivalent to conversion of the result affectation
              double result0  = 1;
              double result1  = 0;

              // this loop handling code is not changed at all
              for (int i = 0; i < 3; ++i) {

                  // source roughly equivalent to conversion of "2 * ..."
                  double tmpA0 = 2 * t0;
                  double tmpA1 = 2 * t1;

                  // source roughly equivalent to conversion of "... + 1"
                  tmpA0 += 1;

                  // source roughly equivalent to conversion of "result * ..."
                  double tmpB0 = result0 * tmpA0;
                  double tmpB1 = result0 * tmpA1 + result1 * tmpA0;

                  // source roughly equivalent to conversion of "result = ..."
                  result0 = tmpB0;
                  result1 = tmpB1;

              }

              // source equivalent to code generated at method exit
              return new DifferentialPair(result0, result1);

          }
        

This example shows that the instructions conversions have local scope. No global tree representation of the method is needed at all.

No intermediate DifferentialPair instances

Another thing that is shown in the previous example is that DifferentialPair instances appear only at method entry and exit. They are not used internally for elementary conversions, only pairs of local variables (or operand stack cells as we will see later) are used.

The only methods from the DifferentialPair class that are used by the automatic differentiator are the two arguments constructor, the getValue method and the getFirstDerivative method.

For differentiable functions that call other differentiable functions, additional instances of DifferentialPair will be used. Some will be built in the caller to pass parameters to the callee, and others will be build in the callee to return results to the caller. This is not the case for calls to the elementary mathematical functions defined in the Math and StrictMath classes. For the known functions the derivative computation are inlined. For example a call to the Math.cos function will be inlined as a call to Math.cos, a call to Math.sin and some intermediate arithmetic operations.

Data flow analysis

As shown in the example above, the original method bytecode contains both immutable parts that must be preserved and parts belonging to what we will call the computation path from the t parameter to the result that must be differentiated. The instructions that must be converted are identified by a data flow analysis seeded with the t parameter, which is changed from a primitive double in the original method to a DifferentialPair in the generated derivative, as explained in the Differential pairs section of the usage documentation.

TODO: explain analysis by "tainted" data and types propagation.

Validity

What is the validity of this approach?

For straightforward smooth functions, the expanded code really computes both the value of the equation and its exact derivative. This is a simple application of the differentiation rules. So the accuracy of the derivative will be in par with the accuracy of the initial function. If the initial function is a good model of a physical process, the derivative will be a good evaluation of its evolution. If the initial function is only an approximation of a real physical model, then the derivative will be an approximation too, but an approximation that is consistent with the initial function up to computer accuracy.

If the initial function is not smooth, the singular points must be analyzed specially. Some design choices are involved which have an impact on validity. These choices have been made in such a way that in some sense, the result is still as valid, as accurate and as consistent with the initial function as for smooth functions. This point is explained in details in the section about singularities .

Virtual machine execution model

TODO: explain local variables and stack cells.

TODO: explain the various subsets using the same categories as in the first section above (basic arithmetic operations, conversion operations, storage instructions and calls to elementary functions).

Implementation

Differential pairs

TODO: explain that the various methods provided by DifferentialPair are convenience methods for end users. They can be used to perform some additional transforms on already derived code. BTW, is this feature really useful ?

Bytecode transforms

TODO

Differentiation rules

TODO: put all rules in a table for reference.

Complete differentiation example

TODO: step by step analysis of the initial example.