Principles

Differentiation at bytecode instructions level

Nabla computes the derivatives by applying the classical differentiation rules at bytecode level. When an instance of a class implementing UnivariateFunction is passed to its differentiate method, Nabla tracks the mathematical operations flow that leads from the t parameter to the return value of the function. At the bytecode instructions level, all operations are elementary ones. Each elementary operation is then changed to compute both the value and the derivatives. Nothing is changed to the control flow instructions (loops, branches, operations scheduling).

Analysis and transformation of the bytecode is realized using both the core API and the tree API from the asm bytecode manipulation and analysis framework. The entry point of this differentiation process is the differentiate method of the MethodDifferentiator which is called from the ForwardModeDifferentiator class for processing the f method of the user class.

All the changed operations belong to a small subset of the virtual machine instructions. This set contains basic arithmetic operations (addition, subtraction ...), conversion operations (double to int, long to double ...), storage instructions (local variables, functions parameters, instance or class fields ...) and calls to elementary functions defined in the Math, StrictMath, FastMath or similar classes. There is really nothing more! For each one of these basic bytecode instructions, Nabla knows how to map them to a mathematical equation and how to hand these equations to a class that will compute derivatives.

Lets consider the DADD bytecode instruction and consider only first derivative for now. This instruction corresponds to the addition of two real numbers and produces a third number which is their sum. Nabla maps the instruction to the equation:

c=a+b
and calls the DerivativeStructure class provided by Apache Commons Math to compute both the value and the first derivative:
(c=a+b, c'=a'+b')
In this example, the DerivativeStructure class uses only the linearity property of differentiation which implies that the derivative of a sum is the sum of the derivatives. Similar rules exist for all arithmetic instructions. The derivative of all basic functions in the Math , StrictMath and FastMath are known. The rules are also known for any derivation order, they are not limited to first order. In fact, Nabla itself does not know any of these rules, all computations are delegated to DerivativeStructure.

Lets consider a more extensive example:

          public class Linear implements UnivariateFunction {
              public double value(double t) {
                  double result = 1;
                  for (int i = 0; i < 3; ++i) {
                      result = result * (2 * t + 1);
                  }
                  return result;
              }
          }
        

In this example, the only things that need to be changed for differentiating the value method are the t parameter and the result local variable which must be adapted to hold the derivative structure, the two multiplications and the addition. In order to generate the new function, Nabla will convert these elements one at a time, starting from the parameter change and propagating the change using a simple data flow analysis. There is no need to analyze the global structure of the code, and no need to change anything in the loop handling instructions above. The method generated for the example above will be roughly similar to the one that would result from compiling the code below (except that Nabla generates bytecode directly, it does not use source at all):

          public DerivativeStructure value(DerivativeStructure t) {

              // source roughly equivalent to conversion of the result initialization
              DerivativeStructure result  = new DerivativeStructure(t.getFreeParameters(),
                                                                    t.getOrder(),
                                                                    1.0);

              // this loop handling code is not changed at all
              for (int i = 0; i < 3; ++i) {

                  // source roughly equivalent to conversion of "2 * ..."
                  DerivativeStructure tmpA = t.multiply(2.0);

                  // source roughly equivalent to conversion of "... + 1"
                  DerivativeStructure tmpB = tmpA.add(1.0);

                  // source roughly equivalent to conversion of "result * ..."
                  DerivativeStructure tmpC = result.multiply(tmpB);

                  // source roughly equivalent to conversion of "result = ..."
                  result = tmpC;

              }

              // source equivalent to code generated at method exit
              return result;

          }
        

This example shows that the instructions conversions have local scope.

double converted to DerivativeStructure

Another thing that is shown in the previous example is that DerivativeStructure instances appear everywhere a double that depends on the input parameter appears, like the result variable and all the temporary variables. However, double that do not depend on the input parameter like the 2.0 and 1.0 literal constants remain primitive double values.

no dependency on differentiation order or number of free parameters

The code above also shows that the generated code does not depend on the derivation order or number or free parameters. In fact, this information is only carried at runtime by the DerivativeStructure instance provided as an unput parameters, and the intermediate instances created on the fly will automatically share these values (see the construction of the result variable and the call to getFreeParameters and getOrder).

Data flow analysis

As shown in the example above, the original method bytecode contains both immutable parts that must be preserved and parts belonging to what we will call the computation path from the t parameter to the result that must be differentiated. The instructions that belong to the computation path and hence that must be converted are identified by a data flow analysis seeded with the t parameter.

The first step of this data flow analysis is to link each data element (either stack cell or local variable, as explained in the virtual machine execution model section) with the instructions that may produce it and the instructions that may consume it. This task is realized by the TrackingInterpreter and TrackingValue classes.

As explained in the double converted to DerivativeStructure section of the usage documentation, the signature of the value method is changed. The primitive doublet parameter in the original method is changed during the differentiation process to a DerivativeStructure instance in the generated derivative.

All instructions that used the original primitive double parameter must be changed to cope with the new DerivativeStructure local variables. In order to do this, the representation of the t parameter in the bytecode is marked as pending conversion from one primitive double to a DerivativeStructure. Once this data element has been marked, the data flow will propagate the mark to other data elements (both variables and operand stack cells) thanks to the following rules:

  • each instruction that consumes a marked data element must be changed
  • each primitive double data element produced by a changed instruction must be marked as pending conversion from one primitive double to a DerivativeStructure
These rules propagate the changes for data and instructions throughout the method code, affecting all instructions that need to be changed and leaving other instructions alone, like the loop handling in the example above.

Validity

What is the validity of this approach?

For straightforward smooth functions, the expanded code really computes both the value of the equation and its exact derivatives. This is a simple application of the differentiation rules. So the accuracy of the derivative will be in par with the accuracy of the initial function. If the initial function is a good model of a physical process, the derivative will be a good evaluation of its evolution. If the initial function is only an approximation of a real physical model, then the derivative will be an approximation too, but an approximation that is consistent with the initial function up to computer accuracy.

If the initial function is not smooth, the singular points must be analyzed specially. Some design choices are involved which have an impact on validity. These choices have been made in such a way that in some sense, the result is still as valid, as accurate and as consistent with the initial function as for smooth functions. This point is explained in details in the section about singularities.

Virtual machine execution model

TODO: explain local variables and stack cells.

TODO: explain the various subsets using the same categories as in the first section above (basic arithmetic operations, conversion operations, storage instructions and calls to elementary functions).

Implementation

Bytecode transforms

TODO

Complete differentiation example

TODO: step by step analysis of the initial example.