Neural networks that generate the weights or parameters of another neural network.
A hypernetwork is a neural model whose primary function is to produce the parameters — weights, biases, or modulation factors — of a separate target network. Rather than learning a fixed set of parameters through standard gradient descent, the target network receives its parameters dynamically from the hypernetwork, conditioned on some input such as a task identifier, context embedding, latent code, or timestep. This transforms parameter selection from a static optimization problem into a learned function, enabling the same architectural backbone to behave differently depending on context.
Formally, a hypernetwork h_φ maps conditioning information c to a parameter vector θ = h_φ(c), which is then used by a primary model f_θ to process its actual inputs. In practice, generating full weight tensors for large networks is computationally expensive, so most implementations output compact representations instead — low-rank weight factors, per-channel scale-and-shift vectors, or layer-wise modulation signals — that are then expanded or applied to the target network. This connects hypernetworks to related ideas like fast weights, conditional computation, and parameter-efficient fine-tuning, where only a small number of adaptive parameters are generated rather than entire weight matrices.
Hypernetworks matter because they provide a principled mechanism for fast, amortized adaptation. Instead of running a separate optimization loop for each new task or context, the hypernetwork learns a direct mapping that generalizes across many settings in a single forward pass. This makes them especially valuable in meta-learning, few-shot classification, continual learning, and conditional generative modeling. Dynamic convolutional filters — where filter weights are predicted from input features — are one prominent application, as are neural processes and implicit neural representations conditioned on scene-specific latent codes.
The concept was formalized for modern deep learning by David Ha, Andrew Dai, and Quoc V. Le in their 2016 paper, though antecedents include Schmidhuber's fast weights and indirect encodings like HyperNEAT. Since then, hypernetworks have seen broad adoption across meta-learning, parameter-efficient transfer, and neural architecture search, with ongoing research addressing stability, expressivity, and scalability challenges that arise when generating parameters for increasingly large models.