Temporary neural network parameters that rapidly adapt to capture short-term contextual dependencies.
Fast weights are a class of adaptive parameters in neural networks that update on a much shorter timescale than conventional weights, enabling a network to dynamically encode transient information within a single sequence or task. While standard "slow" weights are adjusted gradually through backpropagation across many training examples, fast weights change rapidly in response to recent inputs, effectively acting as a short-term memory that complements the long-term knowledge stored in the network's primary parameters. This two-timescale learning dynamic allows the network to simultaneously maintain stable general knowledge and flexibly adapt to immediate context.
The mechanism typically works by computing an outer product of recent hidden states or activity patterns and accumulating these into a fast weight matrix, which is then used to modulate the network's activations. When a new input arrives, the fast weight matrix biases the network's response based on what it has recently encountered, without permanently altering the slow weights. This is conceptually related to associative memory and Hebbian learning, where co-active neurons strengthen their connections transiently. In practice, fast weights decay over time or across steps, ensuring they capture only genuinely short-term dependencies rather than accumulating indefinitely.
Fast weights are particularly relevant to recurrent neural networks and attention-based architectures, where modeling short-range dependencies within a sequence is critical. They offer an alternative or complement to mechanisms like LSTMs and self-attention, providing a more biologically plausible account of working memory. The concept also connects to meta-learning, where inner-loop adaptation across a task can be interpreted as a form of fast weight update, making it foundational to approaches like MAML and hypernetwork-based methods.
The idea was originally proposed by Geoffrey Hinton and colleagues in the late 1980s but gained significant renewed traction in 2016 when Jimmy Ba, Geoffrey Hinton, and collaborators demonstrated its utility in modern deep learning contexts. Since then, fast weights have informed the design of memory-augmented networks, neural Turing machines, and efficient transformer variants, cementing their relevance as a conceptual bridge between classical associative memory and contemporary sequence modeling.