Background¶

Introduction¶

Automatic differentiation (AD) is a family of techniques for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. Application of AD includes Newton’s method for solving nonlinear equations, real-parameter optimization, probabilistic inference, and backpropagation in neural networks. AD has been extremely popular because of the booming development in machine learning and deep learning techniques. Our AD sofeware package enable user to calculate derivatives using the forward and reverse mode.

Our package has feature including support for second order derivatives (including Hssian matrix), rooting finding, optimization(Newton, Gradient Descent, BFGS), and backpropagation.

Mathematical Background¶

Automatic Differentiation decomposes a complex function into a sequence of operations on elementary functions, evaluates the derivatives at each intermediate stage, repeatedly applies the chain rule to obtain the derivative of the outermost function. We provides explanations for related math concepts below.

Elimentary functions

The class of functions consisting of the polynomials, the exponential functions, the logarithmic functions, the trigonometric functions, the inverse trigonometric functions,and the functions obtained from those listed by the four arithmetic operations and by superposition(i.e. composition),applied by finitely many times.

Chain Rule - Used to compute the derivative of a composite function - Core of automatic differentiation

For the first derivative:

\[\dfrac{dy}{dx} = \dfrac{dy}{du}\cdot\dfrac{du}{dx}\]

For the second derivative:

\[\dfrac{\partial^2 t}{\partial x_i \partial x_j} = \sum_k(\dfrac{\partial y}{\partial u_k}\dfrac{u_k^2}{\partial x_i \partial x_j}) + \sum_{k,l}(\dfrac{\partial^2 y}{\partial u_k \partial u_l}\dfrac{\partial u_k}{\partial x_i}\dfrac{\partial u_l}{\partial x_j})\]

Topological Graph - Each node represent a variable - Arrows indicate topological orders(order of operations) and operations themselves.

Forward Mode Autodifferentiation

Follow the topological order and store the values of each variable in the nodes. visit each node in topological order. Let x denote our innermost function. For variable \(u_i=g_i(v)\) we already know \(\dfrac{dv}{dx}\), calculate \(\dfrac{du_i}{dx}= \dfrac{du_i}{dv}\dfrac{dv}{dx}\)

Reverse Mode Autodifferentiation

Has forward computation and backward computation

Step 1: Forward Computation

Follow the topological order and store the values of each variable in each nodes.

Step 2: Backward Computation

let y denote our final output variable and \(u_j\), \(v_j\) denote the intermediate variables

Initialize all partial derivative \(\dfrac{dy}{du_j}\) to 0 and dy/dy = 1
visit each node in reverse topological order. For variable \(u_i=g_i(v_1,...,v_n)\) we already know \(\dfrac{dy} {du_i}\), increment \(\dfrac{dy}{dv_j}\) by \(\dfrac{dy}{du_i}\dfrac{du_i}{dv_j}\)