Background¶
Introduction¶
Automatic differentiation (AD) is a family of techniques for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. Application of AD includes Newton’s method for solving nonlinear equations, real-parameter optimization, probabilistic inference, and backpropagation in neural networks. AD has been extremely popular because of the booming development in machine learning and deep learning techniques. Our AD sofeware package enable user to calculate derivatives using the forward and reverse mode.
Our package has feature including support for second order derivatives (including Hssian matrix), rooting finding, optimization(Newton, Gradient Descent, BFGS), and backpropagation.
Mathematical Background¶
Automatic Differentiation decomposes a complex function into a sequence of operations on elementary functions, evaluates the derivatives at each intermediate stage, repeatedly applies the chain rule to obtain the derivative of the outermost function. We provides explanations for related math concepts below.
Elimentary functions
The class of functions consisting of the polynomials, the exponential functions, the logarithmic functions, the trigonometric functions, the inverse trigonometric functions,and the functions obtained from those listed by the four arithmetic operations and by superposition(i.e. composition),applied by finitely many times.
Chain Rule - Used to compute the derivative of a composite function - Core of automatic differentiation
For the first derivative:
For the second derivative:
Topological Graph - Each node represent a variable - Arrows indicate topological orders(order of operations) and operations themselves.
Forward Mode Autodifferentiation
Follow the topological order and store the values of each variable in the nodes. visit each node in topological order. Let x denote our innermost function. For variable \(u_i=g_i(v)\) we already know \(\dfrac{dv}{dx}\), calculate \(\dfrac{du_i}{dx}= \dfrac{du_i}{dv}\dfrac{dv}{dx}\)
Reverse Mode Autodifferentiation
Has forward computation and backward computation
Step 1: Forward Computation
Follow the topological order and store the values of each variable in each nodes.
Step 2: Backward Computation
let y denote our final output variable and \(u_j\), \(v_j\) denote the intermediate variables
- Initialize all partial derivative \(\dfrac{dy}{du_j}\) to 0 and dy/dy = 1
- visit each node in reverse topological order. For variable \(u_i=g_i(v_1,...,v_n)\) we already know \(\dfrac{dy} {du_i}\), increment \(\dfrac{dy}{dv_j}\) by \(\dfrac{dy}{du_i}\dfrac{du_i}{dv_j}\)