The algorithm that trains neural networks by propagating error gradients backward through layers.
Backpropagation — short for "backward propagation of errors" — is the foundational algorithm used to train artificial neural networks. It works by computing how much each weight in the network contributed to the overall prediction error, then using those contributions to update the weights in a direction that reduces future errors. This process relies on the chain rule of calculus, which allows gradients to be efficiently computed layer by layer, starting from the output and moving backward through the network toward the inputs.
During a forward pass, input data flows through the network and produces a prediction. The difference between that prediction and the true target is quantified by a loss function. Backpropagation then performs a backward pass, computing the gradient of the loss with respect to every weight in the network. These gradients indicate how steeply the loss changes as each weight shifts, and an optimization algorithm — typically stochastic gradient descent or one of its variants — uses them to nudge the weights in the direction that lowers the loss. This cycle of forward and backward passes repeats over many training examples until the network converges on a useful set of weights.
The practical significance of backpropagation is difficult to overstate. Before its widespread adoption, training multi-layer networks was computationally intractable or relied on heuristic methods that didn't scale. Backpropagation made deep networks trainable in a principled, efficient way, unlocking the potential of architectures with many hidden layers. Its 1986 formalization by Rumelhart, Hinton, and Williams in the context of neural networks gave the field a common, reproducible framework that researchers could build upon.
Today, backpropagation remains the engine behind virtually every modern deep learning system, from image classifiers and language models to reinforcement learning agents. Automatic differentiation libraries like PyTorch and TensorFlow implement backpropagation under the hood, freeing practitioners from computing gradients by hand while preserving the algorithm's mathematical rigor. Despite decades of architectural innovation, no comparably general and efficient alternative has displaced it.