Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Convolution

Convolution

A sliding filter operation that extracts spatial patterns from input data.

Year: 1989Generality: 871
Back to Vocab

In machine learning, convolution is a mathematical operation that applies a small matrix of learned weights — called a kernel or filter — across an input array, computing element-wise products and summing them into a single output value at each position. By sliding this kernel systematically across the input (an image, audio signal, or other structured data), the operation produces a feature map that highlights where particular patterns, such as edges, textures, or shapes, appear in the data. The same kernel weights are reused at every position, a property called weight sharing, which dramatically reduces the number of parameters compared to fully connected layers and encodes a useful inductive bias: that meaningful patterns can appear anywhere in the input.

Convolution is the defining operation of convolutional neural networks (CNNs), where multiple learned filters are stacked in successive layers. Early layers tend to detect low-level features like edges and color gradients, while deeper layers combine these into increasingly abstract representations such as object parts or semantic categories. Hyperparameters like stride (how far the kernel moves at each step) and padding (how the input boundaries are handled) control the spatial dimensions of the output. Pooling layers are often interleaved with convolutional layers to further reduce spatial resolution and build translational robustness.

The practical importance of convolution in AI became clear with Yann LeCun's LeNet architecture in the late 1980s and 1990s, which applied convolutional layers to handwritten digit recognition with strong results. The concept exploded in relevance after AlexNet's victory in the 2012 ImageNet competition demonstrated that deep CNNs trained on GPUs could dramatically outperform other approaches on large-scale image classification. Since then, convolution has become a foundational building block across computer vision, medical imaging, speech recognition, and even natural language processing tasks involving sequential structure.

Beyond standard 2D image convolution, the operation has been extended in many directions: 1D convolution for sequences, 3D convolution for video, depthwise separable convolution for efficiency, and dilated convolution for expanding receptive fields without losing resolution. While attention-based architectures like Vision Transformers have begun to challenge CNNs in some domains, convolution remains one of the most widely used and well-understood operations in modern deep learning.

Related

Related

CNN (Convolutional Neural Network)
CNN (Convolutional Neural Network)

A deep learning architecture that learns spatial hierarchies of features from visual data.

Generality: 875
Local Weight Sharing
Local Weight Sharing

Reusing the same weights across spatial positions to detect patterns regardless of location.

Generality: 694
FFT Accelerated Convolutions
FFT Accelerated Convolutions

Computing convolutions via frequency-domain multiplication for faster large-kernel operations.

Generality: 485
Transposed Convolutional Layer
Transposed Convolutional Layer

A learnable layer that upsamples spatial feature maps by reversing the convolution operation.

Generality: 650
Max Pooling
Max Pooling

A downsampling operation that retains the maximum value within each local region.

Generality: 694
Local Pooling
Local Pooling

A downsampling operation that aggregates local feature map regions into compact, abstract representations.

Generality: 656