A CNN architecture using skip connections to enable training of very deep networks.
ResNet, short for Residual Network, is a convolutional neural network architecture designed to make training very deep networks practical and effective. Introduced by Kaiming He and colleagues at Microsoft Research in 2015, it addressed a fundamental obstacle in deep learning: as networks grow deeper, gradients tend to vanish or explode during backpropagation, causing training to stall and performance to degrade. ResNet's solution was elegantly simple — rather than expecting each stack of layers to learn a direct mapping from input to output, it asks them to learn a residual function, the difference between the desired output and the input itself.
The key mechanism enabling this is the skip connection (also called a shortcut connection), which routes the input of a block directly to its output, bypassing one or more intermediate layers. The result is added element-wise to the transformed output before passing to the next block. Mathematically, if a block learns a function F(x), the block's output becomes F(x) + x rather than F(x) alone. This seemingly minor change has a profound effect: gradients can flow backward through the skip path without being attenuated by layer after layer of transformations, making it feasible to train networks with hundreds or even thousands of layers.
ResNet's impact was immediately apparent when it won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015, achieving top-5 error rates that surpassed all prior architectures by a significant margin. The original paper introduced variants ranging from ResNet-18 to ResNet-152, with the number indicating total layer depth. Subsequent work extended the concept further, producing architectures like ResNeXt, Wide ResNet, and DenseNet, all of which build on the residual learning principle.
Beyond image classification, ResNet has become a foundational backbone across computer vision tasks including object detection, semantic segmentation, and medical image analysis. Its design principles have also influenced architectures outside vision, including components of modern natural language processing models. The residual connection concept is now considered a standard building block in deep learning, demonstrating how a targeted structural innovation can unlock entirely new scales of model complexity.