Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Fast Weights

Fast Weights

Temporary neural network parameters that rapidly adapt to capture short-term contextual dependencies.

Year: 2016Generality: 339
Back to Vocab

Fast weights are a class of adaptive parameters in neural networks that update on a much shorter timescale than conventional weights, enabling a network to dynamically encode transient information within a single sequence or task. While standard "slow" weights are adjusted gradually through backpropagation across many training examples, fast weights change rapidly in response to recent inputs, effectively acting as a short-term memory that complements the long-term knowledge stored in the network's primary parameters. This two-timescale learning dynamic allows the network to simultaneously maintain stable general knowledge and flexibly adapt to immediate context.

The mechanism typically works by computing an outer product of recent hidden states or activity patterns and accumulating these into a fast weight matrix, which is then used to modulate the network's activations. When a new input arrives, the fast weight matrix biases the network's response based on what it has recently encountered, without permanently altering the slow weights. This is conceptually related to associative memory and Hebbian learning, where co-active neurons strengthen their connections transiently. In practice, fast weights decay over time or across steps, ensuring they capture only genuinely short-term dependencies rather than accumulating indefinitely.

Fast weights are particularly relevant to recurrent neural networks and attention-based architectures, where modeling short-range dependencies within a sequence is critical. They offer an alternative or complement to mechanisms like LSTMs and self-attention, providing a more biologically plausible account of working memory. The concept also connects to meta-learning, where inner-loop adaptation across a task can be interpreted as a form of fast weight update, making it foundational to approaches like MAML and hypernetwork-based methods.

The idea was originally proposed by Geoffrey Hinton and colleagues in the late 1980s but gained significant renewed traction in 2016 when Jimmy Ba, Geoffrey Hinton, and collaborators demonstrated its utility in modern deep learning contexts. Since then, fast weights have informed the design of memory-augmented networks, neural Turing machines, and efficient transformer variants, cementing their relevance as a conceptual bridge between classical associative memory and contemporary sequence modeling.

Related

Related

Hypernetworks
Hypernetworks

Neural networks that generate the weights or parameters of another neural network.

Generality: 580
Hypernetwork
Hypernetwork

A neural network that generates weights for another neural network dynamically.

Generality: 575
Weight
Weight

A learnable parameter that scales the influence of inputs within a model.

Generality: 850
Open Weights
Open Weights

Publicly released model parameters that enable transparency, reproducibility, and collaborative AI development.

Generality: 694
Flash Attention
Flash Attention

A GPU-optimized attention algorithm that efficiently processes long sequences with reduced memory.

Generality: 492
Local Weight Sharing
Local Weight Sharing

Reusing the same weights across spatial positions to detect patterns regardless of location.

Generality: 694