A mathematical measure of error that guides model training toward better predictions.
A loss function is a mathematical function that quantifies how far a model's predictions deviate from the true target values. During training, the model produces outputs for a given set of inputs, and the loss function computes a scalar score — often called the loss or cost — that summarizes the magnitude of prediction errors across a batch or dataset. Common examples include mean squared error (MSE) for regression tasks, which penalizes large deviations quadratically, and cross-entropy loss for classification tasks, which measures the divergence between predicted probability distributions and true class labels. The specific choice of loss function encodes assumptions about the problem structure and directly shapes what the model learns to optimize.
The loss function sits at the center of the training loop. Optimization algorithms such as stochastic gradient descent (SGD) compute the gradient of the loss with respect to each model parameter — a process made tractable in neural networks by the backpropagation algorithm — and update parameters in the direction that reduces the loss. This iterative process continues until the loss converges to a minimum or a stopping criterion is met. Because the loss landscape can be highly non-convex in deep networks, the choice of loss function interacts critically with optimizer design, learning rate schedules, and regularization strategies.
Beyond standard supervised learning, loss functions have been extended and customized for a wide range of settings. Contrastive losses and triplet losses power metric learning systems; adversarial losses define the training dynamics of generative adversarial networks; and reinforcement learning uses reward signals that function analogously to negative loss. Researchers also design task-specific losses — such as focal loss for class-imbalanced detection problems or perceptual loss for image synthesis — to encode domain knowledge directly into the optimization objective.
The loss function is arguably the most consequential design decision in building a machine learning system. A poorly chosen loss can cause a model to optimize for a proxy objective that diverges from real-world goals, a phenomenon sometimes called Goodhart's Law in ML contexts. Conversely, a well-designed loss function can dramatically accelerate convergence, improve generalization, and align model behavior with intended outcomes, making it a primary lever for both researchers and practitioners.