Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Attention

Attention

A mechanism enabling neural networks to dynamically focus on relevant parts of input.

Year: 2014Generality: 875
Back to Vocab

Attention is a neural network mechanism that allows models to selectively weight different parts of their input when producing an output, rather than treating all input elements equally. Instead of compressing an entire input sequence into a single fixed-length representation, attention computes a set of scores indicating how relevant each input element is to a given output step. These scores are normalized into a probability distribution and used to form a weighted sum of input representations, effectively directing the model's focus toward the most contextually important information at each step.

The mechanics of attention typically involve three components: queries, keys, and values. A query represents what the model is currently trying to compute; keys represent the available input elements; and values carry the actual content to be aggregated. Compatibility between a query and each key is measured—often via dot product—and the resulting scores are passed through a softmax function to produce attention weights. The final output is a weighted combination of the values, emphasizing those most relevant to the query. This formulation, known as scaled dot-product attention, underpins the Transformer architecture introduced in 2017.

Attention became central to machine learning with Bahdanau et al.'s 2014 work on neural machine translation, where it solved the bottleneck problem in encoder-decoder models by allowing the decoder to look back at all encoder states rather than relying on a single compressed vector. This dramatically improved translation quality on long sentences. The 2017 Transformer paper then demonstrated that attention alone—without recurrence or convolution—could achieve state-of-the-art results, making attention the dominant paradigm in sequence modeling.

The impact of attention on modern AI is difficult to overstate. It is the foundational operation in large language models such as GPT and BERT, and has been extended to computer vision, multimodal learning, and reinforcement learning. Multi-head attention, which runs several attention operations in parallel and concatenates their outputs, allows models to simultaneously capture different types of relationships within data. Attention mechanisms have enabled models to scale to unprecedented sizes while maintaining the ability to capture long-range dependencies that earlier architectures struggled to represent.

Related

Related

Attention Mechanisms
Attention Mechanisms

Neural network components that dynamically weight input elements by their contextual relevance.

Generality: 865
Attention Mechanism
Attention Mechanism

A neural network technique that dynamically weights input elements by their relevance to the task.

Generality: 875
Attention Network
Attention Network

A neural network that dynamically weights input elements to capture relevant context.

Generality: 796
Self-Attention
Self-Attention

A mechanism that lets neural networks weigh relationships between all parts of an input simultaneously.

Generality: 794
Attention Pattern
Attention Pattern

A mechanism that lets neural networks selectively focus on relevant parts of input.

Generality: 752
Attention Block
Attention Block

A neural network module that selectively weighs input elements by their contextual relevance.

Generality: 752