A neural network that generates weights for another neural network dynamically.
A hypernetwork is a neural network whose primary function is to generate the parameters — weights and biases — of a separate target network, rather than learning those parameters directly through standard gradient descent. This creates a two-level architecture: the hypernetwork takes some input (such as a task description, a layer index, or an embedding) and outputs the weights that the target network will use to process its own inputs. The result is a system where the target network's behavior can be conditioned on external information, making it far more flexible than a conventionally trained model with fixed parameters.
The mechanics of hypernetworks vary by application. In some designs, a single hypernetwork generates all weights for every layer of the target network, often by taking a small learned embedding per layer as input. In others, the hypernetwork is conditioned on task-level information, allowing the target network to rapidly reconfigure itself for new tasks without retraining. This makes hypernetworks especially well-suited to meta-learning and few-shot learning scenarios, where a model must adapt quickly from limited examples. They also appear in continual learning, neural architecture search, and sequence modeling, where weight generation can be tied to temporal or contextual signals.
One of the key practical advantages of hypernetworks is parameter efficiency. Rather than storing a unique, full set of weights for every possible configuration or task, a compact hypernetwork can implicitly encode a much larger space of models. This compression can reduce memory requirements while preserving expressive power. Additionally, because the hypernetwork imposes a structured prior over the weight space, it can act as a form of regularization, sometimes improving generalization compared to directly learned weights.
Hypernetworks gained significant traction following the 2016 paper by David Ha, Andrew Dai, and Quoc V. Le, which demonstrated their utility across recurrent and convolutional architectures. Since then, the concept has expanded considerably, with hypernetworks appearing in Bayesian deep learning (where they parameterize weight distributions), neural radiance fields, and large language model adaptation techniques like LoRA-adjacent methods. They represent a broader shift in deep learning toward architectures that treat model weights themselves as learnable, dynamic objects rather than fixed endpoints of optimization.