Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Residual Connections

Residual Connections

Shortcut connections in deep networks that enable training of much deeper architectures.

Year: 2015Generality: 834
Back to Vocab

Residual connections are architectural shortcuts in deep neural networks that bypass one or more layers by adding a layer's input directly to its output. Formally, instead of learning a direct mapping H(x), a layer learns a residual function F(x) = H(x) − x, so the full output becomes F(x) + x. This seemingly simple modification has profound consequences: gradients can flow backward through the additive shortcut without passing through potentially saturating nonlinearities, dramatically reducing the vanishing gradient problem that had long prevented practitioners from training very deep networks.

The practical impact is substantial. Before residual connections, networks deeper than roughly 20–30 layers would often perform worse than shallower counterparts due to optimization difficulties — not overfitting, but outright training failure. Residual connections broke this barrier, enabling the training of networks with hundreds or even thousands of layers. The original ResNet architecture won the 2015 ImageNet competition with a 152-layer model, a depth that would have been essentially untrainable with conventional feedforward designs.

Residual connections have proven remarkably general beyond image classification. They are a foundational component of the Transformer architecture, where they appear around every attention and feed-forward sublayer, stabilizing training of very deep language models. They also appear in U-Nets for image segmentation, dense networks, highway networks, and nearly every state-of-the-art deep learning architecture developed since 2015. Their success has inspired related ideas such as dense connections, which add shortcuts between all pairs of layers, and gated residual connections, which learn how much of the shortcut to pass through.

Theoretically, residual connections are understood to smooth the loss landscape, making it less chaotic and easier for gradient-based optimizers to navigate. They also provide an implicit ensemble effect: a deep residual network can be interpreted as a collection of many shorter paths through the network, each contributing to the final prediction. This robustness to layer removal — residual networks degrade gracefully when layers are dropped at test time — further supports this ensemble interpretation and underscores why residual connections have become a near-universal design pattern in modern deep learning.

Related

Related

ResNet (Residual Network)
ResNet (Residual Network)

A CNN architecture using skip connections to enable training of very deep networks.

Generality: 795
DRL (Deep Residual Learning)
DRL (Deep Residual Learning)

A neural network design using skip connections so layers learn residual mappings, enabling much deeper models.

Generality: 752
Vanishing Gradient
Vanishing Gradient

A training failure where gradients shrink exponentially, preventing early network layers from learning.

Generality: 720
Hidden Layer
Hidden Layer

An intermediate neural network layer that learns internal representations of data.

Generality: 796
RNN (Recurrent Neural Network)
RNN (Recurrent Neural Network)

Neural networks with feedback connections that process sequential data using internal memory.

Generality: 838
Reservoir Computing
Reservoir Computing

A framework using fixed random recurrent networks to efficiently learn from temporal data.

Generality: 579