A learnable parameter that scales the influence of inputs within a model.
In machine learning, a weight is a learnable numerical parameter that determines how strongly an input signal or feature influences a model's output. In linear models, weights are the coefficients multiplied by input features before summing to produce a prediction. In neural networks, weights are associated with the connections between neurons across layers — each connection carries a scalar weight that scales the activation signal passing through it. The collection of all weights in a model defines its learned representation of the underlying data distribution.
Weights are adjusted during training through an optimization process designed to minimize a loss function. The most common approach is gradient descent, where the gradient of the loss with respect to each weight is computed via backpropagation and used to nudge weights in the direction that reduces error. This iterative update process — repeated over many batches of training data — allows a model to progressively encode useful patterns. The learning rate controls the step size of each update, making it one of the most important hyperparameters governing how weights evolve.
The initial values of weights matter considerably. Poorly initialized weights can lead to vanishing or exploding gradients, stalling or destabilizing training. Techniques such as Xavier and He initialization were developed specifically to set weights at scales appropriate for the depth and activation functions of a given network, enabling stable gradient flow from the outset.
Weights are central to nearly every aspect of model behavior: they encode what a model has learned, determine its capacity to generalize, and are the primary target of regularization techniques like L1 and L2 penalties, which discourage excessively large weight values to reduce overfitting. In the era of large pretrained models, the sheer count of weights — often in the billions — has become a key metric of model scale and capability. Transferring pretrained weights to new tasks via fine-tuning has also become a dominant paradigm, underscoring how much learned information is stored within a model's weight values.