A model-internal variable whose value is learned directly from training data.
In machine learning, a parameter is any variable that is internal to a model and whose value is determined through the training process rather than set manually beforehand. Parameters encode what a model has learned from data — they are the adjustable knobs that the training algorithm tunes in order to minimize prediction error. In a linear regression model, parameters are the coefficients assigned to each input feature. In a neural network, they are the weights connecting neurons across layers and the bias terms added at each node. The total count of parameters in a model is often used as a rough proxy for its capacity: larger models with more parameters can represent more complex functions, though they also require more data and compute to train effectively.
During training, parameters are updated iteratively using an optimization algorithm such as stochastic gradient descent. The algorithm computes the gradient of a loss function with respect to each parameter and adjusts values in the direction that reduces error. This process continues over many passes through the training data until the parameters converge to values that produce accurate predictions. How parameters are initialized before training begins can significantly influence whether optimization converges at all, and techniques like Xavier or He initialization have been developed specifically to address this.
Parameters are distinct from hyperparameters, which govern the structure and behavior of the learning process itself — things like learning rate, batch size, or the number of layers in a network. Hyperparameters are set by the practitioner before training; parameters are discovered by the model during training. This distinction matters because it separates what the model learns from how the learning is configured.
The scale of parameters has grown dramatically with modern deep learning. Early neural networks had thousands of parameters; contemporary large language models contain hundreds of billions. This explosion in parameter count has driven much of the recent progress in AI capabilities, but also raises important questions about computational cost, energy consumption, and the interpretability of what these parameters actually represent. Understanding parameter behavior — how they initialize, update, and generalize — remains a central concern in both theoretical and applied machine learning research.