The total count of learnable weights and biases in a machine learning model.
Parameter size refers to the total number of learnable values — weights, biases, and other trainable quantities — contained within a machine learning model. Each parameter is a scalar value adjusted during training through optimization algorithms like stochastic gradient descent, collectively shaping how the model transforms inputs into outputs. In neural networks, parameters are distributed across layers as weight matrices and bias vectors, and their total count is determined by architectural choices such as layer depth, layer width, and connectivity patterns.
The relationship between parameter size and model capability is central to modern deep learning. Larger models can represent more complex functions and capture finer-grained patterns in data, which is why scaling parameter counts has driven many state-of-the-art results across vision, language, and multimodal tasks. GPT-3, for instance, contains 175 billion parameters — a scale that enables remarkably flexible language generation. However, more parameters demand proportionally more memory, compute, and training data, and they increase the risk of overfitting when data is scarce.
Parameter size became a defining concern in the era of large language models and foundation models, roughly from 2018 onward, when researchers began systematically studying how model performance scales with parameter count, dataset size, and compute budget. Scaling laws research demonstrated that these relationships follow predictable power-law curves, making parameter size a key variable in deliberate model design rather than an incidental outcome. This spurred both the race toward trillion-parameter models and a parallel effort toward parameter-efficient methods.
Because raw parameter count is expensive to deploy, significant research has focused on reducing effective parameter size without sacrificing performance. Techniques such as pruning, quantization, knowledge distillation, and parameter-efficient fine-tuning methods like LoRA allow practitioners to compress or adapt large models for resource-constrained environments. Understanding parameter size is therefore essential not only for training powerful models but for making them practical across a wide range of real-world applications.