Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Attention Block

Attention Block

A neural network module that selectively weighs input elements by their contextual relevance.

Year: 2017Generality: 752
Back to Vocab

An attention block is a modular component in neural networks that enables a model to dynamically focus on the most relevant parts of an input sequence when generating each element of its output. Rather than treating all input positions equally, the block learns to assign varying weights to different positions based on their contextual importance — a capability that proves especially powerful when modeling long-range dependencies that simpler architectures struggle to capture.

The mechanism works by computing three learned projections from the input: queries, keys, and values. The query represents what the current output position is "looking for," while keys represent what each input position "offers." A compatibility score is computed between each query and all keys — typically via scaled dot-product — and passed through a softmax to produce a probability distribution of attention weights. These weights are then used to take a weighted sum of the value vectors, producing a context-aware representation. In practice, modern architectures use multi-head attention, running several attention operations in parallel across different learned subspaces and concatenating the results, allowing the model to simultaneously attend to information from multiple representational perspectives.

Attention blocks became foundational to the transformer architecture introduced by Vaswani et al. in 2017, which replaced recurrent and convolutional layers entirely with stacked attention and feed-forward blocks. This shift unlocked massive parallelism during training, since attention over a sequence can be computed in a single matrix operation rather than step-by-step. The result was dramatically faster training and superior performance on sequence modeling tasks, catalyzing breakthroughs in natural language processing and, later, computer vision and multimodal learning.

Today, attention blocks are among the most widely deployed components in deep learning. They form the backbone of large language models like GPT and BERT, vision transformers, and cross-modal architectures. Their ability to flexibly route information across arbitrary positions in a sequence — without inductive biases like locality or fixed receptive fields — makes them highly general and adaptable across domains, from protein structure prediction to code generation.

Related

Related

Attention
Attention

A mechanism enabling neural networks to dynamically focus on relevant parts of input.

Generality: 875
Attention Mechanisms
Attention Mechanisms

Neural network components that dynamically weight input elements by their contextual relevance.

Generality: 865
Attention Network
Attention Network

A neural network that dynamically weights input elements to capture relevant context.

Generality: 796
Attention Mechanism
Attention Mechanism

A neural network technique that dynamically weights input elements by their relevance to the task.

Generality: 875
Self-Attention
Self-Attention

A mechanism that lets neural networks weigh relationships between all parts of an input simultaneously.

Generality: 794
Transformer Block
Transformer Block

A core neural network module combining self-attention and feedforward layers for sequence modeling.

Generality: 820