A method to compute exact derivatives automatically, enabling efficient neural network training.
Automatic differentiation (autodiff) is a computational technique for evaluating derivatives of functions expressed as computer programs, producing exact results rather than approximations. Unlike symbolic differentiation—which manipulates mathematical expressions and can produce unwieldy formulas—or numerical differentiation—which introduces floating-point approximation errors through finite differences—autodiff works by decomposing a computation into a sequence of elementary operations and systematically applying the chain rule at each step. The result is machine-precision accuracy at a computational cost proportional to the original function, making it far more practical than its alternatives for the large, nested functions that define modern neural networks.
Autodiff operates in two primary modes. Forward mode propagates derivative information alongside the original computation, computing directional derivatives efficiently when the number of inputs is small. Reverse mode, which corresponds to the backpropagation algorithm used in deep learning, accumulates gradients by traversing the computation graph backward from outputs to inputs. Reverse mode is particularly powerful when a model has many parameters but a scalar loss, since a single backward pass computes gradients with respect to all parameters simultaneously—a property that makes training million- or billion-parameter models computationally feasible.
The practical impact of autodiff on machine learning cannot be overstated. Before it was integrated into ML frameworks, researchers had to derive and implement gradients by hand, a tedious and error-prone process that severely limited model complexity. Modern libraries such as TensorFlow and PyTorch build autodiff engines at their core, allowing practitioners to define arbitrarily complex model architectures and obtain exact gradients automatically. This capability is what enables gradient-based optimizers like SGD and Adam to train deep networks reliably at scale.
Autodiff became central to machine learning in the mid-2010s when deep learning frameworks made reverse-mode autodiff accessible to a broad research and engineering community. The concept of a dynamic computation graph—introduced by frameworks like Chainer and later PyTorch—further expanded autodiff's flexibility, allowing gradients to flow through control flow structures such as loops and conditionals. Today, autodiff is considered a foundational primitive of differentiable programming, extending beyond neural networks into scientific computing, probabilistic modeling, and physics-based simulation.