The iterative process of optimizing a model's parameters using data.
Training is the core process by which a machine learning model learns from data. During training, the model is repeatedly exposed to examples and adjusts its internal parameters — such as the weights in a neural network — to minimize the discrepancy between its predictions and the correct outputs. This discrepancy is quantified by a loss function, and the adjustments are guided by an optimization algorithm such as stochastic gradient descent. The result is a model whose parameters encode statistical patterns extracted from the training data, enabling it to make useful predictions on new inputs.
The mechanics of training vary by model type, but in deep learning the dominant approach combines forward passes — where inputs flow through the network to produce predictions — with backpropagation, which computes gradients of the loss with respect to each parameter. These gradients indicate how each weight should change to reduce the loss, and the optimizer applies those changes incrementally over many iterations. Hyperparameters such as learning rate, batch size, and regularization strength shape how efficiently and stably this process converges.
A critical concern during training is generalization: the model must learn the underlying structure of the data rather than memorizing the specific examples it was shown. To monitor this, practitioners typically hold out a validation set that is never used for parameter updates. If performance on the validation set degrades while training loss continues to fall, the model is overfitting. Techniques such as dropout, weight decay, early stopping, and data augmentation are commonly employed to keep overfitting in check and improve generalization to unseen data.
Training can take many forms depending on the learning paradigm. Supervised training relies on labeled input-output pairs; unsupervised training finds structure in unlabeled data; reinforcement learning trains agents through reward signals from environmental interaction; and self-supervised training constructs supervisory signals from the data itself, as in large language model pretraining. Regardless of paradigm, the computational cost of training scales with model size and dataset volume, making efficient training infrastructure — including GPUs, distributed computing, and mixed-precision arithmetic — an essential part of modern machine learning practice.