Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Attention Mechanism

Attention Mechanism

A neural network technique that dynamically weights input elements by their relevance to the task.

Year: 2014Generality: 875
Back to Vocab

The attention mechanism is a core component of modern neural network architectures that enables models to selectively focus on the most relevant parts of an input when producing an output. Rather than compressing all input information into a single fixed-length representation, attention allows the model to dynamically assign importance scores—called attention weights—to different elements of the input sequence. These weights are computed based on the relationship between the current output state and each input element, allowing the network to "attend" to whichever parts of the input are most useful at each step of processing.

At a technical level, attention is typically implemented by computing a compatibility score between a query vector and a set of key vectors, then using a softmax function to normalize these scores into a probability distribution over the input. The resulting weights are used to compute a weighted sum of value vectors, producing a context-aware representation. This query-key-value formulation, central to the Transformer architecture introduced in 2017, generalizes naturally to multi-head attention, where multiple attention operations run in parallel to capture different types of relationships simultaneously.

The mechanism was first applied to sequence-to-sequence models in 2014 by Bahdanau and colleagues, who used it to improve neural machine translation by allowing the decoder to look back at relevant parts of the source sentence rather than relying solely on a compressed encoder state. This addressed a critical bottleneck in recurrent models and dramatically improved translation quality on long sequences. The subsequent Transformer architecture eliminated recurrence entirely, relying on attention alone to model dependencies across sequences—enabling far greater parallelism and scalability.

Attention mechanisms have become foundational to virtually all state-of-the-art models in natural language processing, computer vision, and multimodal learning. Beyond performance gains, attention weights offer a degree of interpretability, providing insight into which input features the model prioritizes for a given prediction. The concept has proven so broadly applicable that it now underpins large language models, vision transformers, and cross-modal systems, making it one of the most consequential innovations in modern deep learning.

Related

Related

Attention Mechanisms
Attention Mechanisms

Neural network components that dynamically weight input elements by their contextual relevance.

Generality: 865
Attention
Attention

A mechanism enabling neural networks to dynamically focus on relevant parts of input.

Generality: 875
Attention Network
Attention Network

A neural network that dynamically weights input elements to capture relevant context.

Generality: 796
Self-Attention
Self-Attention

A mechanism that lets neural networks weigh relationships between all parts of an input simultaneously.

Generality: 794
Attention Pattern
Attention Pattern

A mechanism that lets neural networks selectively focus on relevant parts of input.

Generality: 752
Attention Block
Attention Block

A neural network module that selectively weighs input elements by their contextual relevance.

Generality: 752