A loss function measuring divergence between predicted probability distributions and true labels.
Cross-entropy loss is a fundamental objective function used in machine learning, particularly for classification tasks. Rooted in information theory, it measures how well a model's predicted probability distribution aligns with the true distribution of labels. For a given example, the loss is computed as the negative log-probability assigned to the correct class — meaning the model is penalized heavily when it assigns low probability to the right answer. Summed or averaged across a training dataset, this quantity gives the optimizer a differentiable signal to minimize during gradient descent.
The mechanics of cross-entropy loss make it especially well-suited for training neural networks with softmax output layers. When a model confidently predicts the wrong class, the logarithmic penalty becomes very large, producing strong gradient signals that push weights toward correction. Conversely, when the model is nearly correct, the loss approaches zero and gradients shrink naturally. This behavior accelerates early learning and stabilizes training as the model converges — properties that simpler loss functions like mean squared error do not provide as cleanly for classification settings.
Cross-entropy loss generalizes naturally to multi-class problems through categorical cross-entropy, and to binary classification through binary cross-entropy (also called log loss). In both cases, the mathematical form is equivalent to maximizing the log-likelihood of the correct labels under the model's predicted distribution, connecting the loss function directly to the principle of maximum likelihood estimation. This statistical grounding gives cross-entropy loss a principled justification beyond empirical performance.
The widespread adoption of cross-entropy loss in deep learning has made it a near-universal default for classification problems, from image recognition to natural language processing. Its compatibility with backpropagation, numerical stability when paired with log-softmax implementations, and strong theoretical foundations have cemented its role as one of the most important tools in a practitioner's toolkit. Understanding cross-entropy loss is essential for diagnosing model behavior, interpreting training curves, and designing effective learning systems.