A margin-based loss function central to support vector machine classification.
Hinge loss is a loss function designed for binary classification tasks, most famously used in support vector machines (SVMs). It is defined as max(0, 1 − y·f(x)), where y is the true class label (typically +1 or −1) and f(x) is the model's raw predicted score. The function returns zero when a prediction is correct and sufficiently confident — that is, when the predicted score lands on the right side of the decision boundary with a margin of at least one. When a prediction is wrong or falls within the margin, the loss grows linearly, penalizing the model proportionally to how far the prediction strays from the correct side.
The core insight behind hinge loss is margin maximization. Rather than simply rewarding correct predictions, it demands that correct predictions be correct by a meaningful margin. This geometric intuition drives the SVM's defining property: finding the hyperplane that sits as far as possible from the nearest training examples on either side. Points that are correctly classified beyond the margin contribute zero loss and have no influence on the decision boundary, making the model focus its attention on the most ambiguous, boundary-adjacent examples — the support vectors.
Hinge loss is not differentiable at the hinge point where y·f(x) = 1, but it is convex and subdifferentiable everywhere, which means standard optimization techniques like subgradient descent can still be applied effectively. This convexity is a significant practical advantage, as it guarantees that optimization will not get trapped in local minima. Regularization terms are typically added alongside hinge loss to control model complexity and prevent overfitting.
Beyond SVMs, hinge loss appears in structured prediction, ranking problems, and multiclass extensions such as the Weston-Watkins and Crammer-Singer formulations. Its emphasis on margin rather than raw probability makes it philosophically distinct from cross-entropy loss, which dominates deep learning. Hinge loss remains a foundational concept in understanding how geometric margin and statistical generalization are connected in supervised learning.