A neural network design using skip connections so layers learn residual mappings, enabling much deeper models.
Deep residual learning is an architectural design principle in which each block of layers learns a residual function F(x) rather than attempting to directly approximate a desired underlying mapping H(x). The block's output is computed as F(x) + x, where x is passed through an identity shortcut connection that bypasses the learned layers entirely. When input and output dimensions differ, a linear projection replaces the identity. This reformulation means layers only need to learn what to add to the input, not reconstruct the full target representation from scratch.
The practical motivation stems from the degradation problem: as plain networks grow deeper, training accuracy paradoxically worsens—not due to overfitting, but due to optimization difficulty. Residual connections address this by giving gradients a direct path backward through the network, substantially reducing vanishing and exploding gradient issues. In practice, residual blocks are typically built from convolutions, batch normalization, and ReLU activations, often arranged in bottleneck configurations that reduce computational cost while preserving representational capacity. These design choices allow stable training of networks with hundreds or even thousands of layers.
The concept was introduced by Kaiming He and colleagues at Microsoft Research in their 2015 ResNet paper, which won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) that year by a significant margin. The result was immediately influential: ResNet and its variants became the default backbone for computer vision tasks including image classification, object detection, and semantic segmentation. Residual-style connections subsequently appeared in speech recognition, natural language processing, and generative models, and they are a foundational structural element in many modern architectures.
Beyond empirical success, residual networks carry theoretical significance. They can be interpreted through the lens of dynamical systems, where each block approximates a small update step in an iterative refinement process—an analogy that connects deep networks to numerical ODE solvers and motivates continuous-depth models like Neural ODEs. The inductive bias toward incremental representation refinement, combined with improved gradient flow, makes residual connections one of the most broadly adopted and theoretically grounded ideas in contemporary machine learning.