A linear combination of inputs scaled by learned weights, fundamental to neural networks.
A weighted sum is a mathematical operation in which each input value is multiplied by a corresponding weight and the resulting products are added together to produce a single scalar output. In neural networks, this operation sits at the heart of every artificial neuron: given an input vector x and a weight vector w, the weighted sum is computed as z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b, where b is an optional bias term. This value is then typically passed through a nonlinear activation function to produce the neuron's output. The simplicity of the operation belies its power — stacking many such computations across layers allows neural networks to approximate arbitrarily complex functions.
The weights in a weighted sum encode the relative importance of each input to the final output. A large positive weight amplifies a feature's influence, a large negative weight suppresses or inverts it, and a weight near zero effectively ignores it. During training, these weights are learned through backpropagation and gradient descent: the network computes a loss measuring prediction error, calculates how that loss changes with respect to each weight, and nudges the weights in the direction that reduces error. Over many iterations, the weighted sums across all layers collectively learn to extract and combine features in ways that solve the target task.
Weighted sums are not unique to neural networks — they appear in linear regression, support vector machines, attention mechanisms, and ensemble methods like boosting, where base model predictions are combined using learned or heuristic weights. In the transformer architecture, the scaled dot-product attention mechanism is essentially a weighted sum of value vectors, where the weights are derived from query-key similarity scores. This makes the weighted sum one of the most pervasive primitives in all of machine learning.
Understanding weighted sums is essential for interpreting model behavior. Techniques like saliency maps and LIME approximate complex models locally as weighted sums to explain individual predictions. The concept also connects directly to the notion of linear separability — a single weighted sum with a threshold can classify any linearly separable dataset, while deeper compositions of weighted sums and nonlinearities handle far more complex decision boundaries.