A table that breaks down a classifier's predictions against actual class labels.
A confusion matrix is a structured table used to evaluate the performance of a classification model by comparing its predicted labels against the true labels in a dataset. For a binary classifier, the matrix is organized into four cells: true positives (TP), where the model correctly predicts the positive class; true negatives (TN), where it correctly predicts the negative class; false positives (FP), where it incorrectly predicts positive; and false negatives (FN), where it incorrectly predicts negative. This layout makes it immediately clear not just how often the model is right, but precisely how and where it fails — whether it tends to over-predict one class, under-predict another, or confuse specific categories with each other.
The confusion matrix serves as the foundation for a wide range of derived performance metrics. Accuracy is the proportion of all correct predictions, but in imbalanced datasets this number can be misleading. Precision measures how many of the model's positive predictions were actually correct, while recall (also called sensitivity) measures how many actual positives the model successfully identified. The F1 score harmonizes precision and recall into a single value. Specificity, the false positive rate, and the Matthews correlation coefficient are also derivable from the same four cells. This richness makes the confusion matrix far more informative than a single accuracy figure alone.
For multi-class problems, the confusion matrix extends naturally into an N×N grid, where N is the number of classes. Each row represents the actual class and each column represents the predicted class, so off-diagonal entries reveal exactly which classes the model confuses with which others. A model that frequently misclassifies cats as dogs but rarely makes the reverse error will show an asymmetric pattern that a scalar metric would obscure entirely.
In practice, confusion matrices guide iterative model improvement. By identifying which class boundaries are poorly learned, practitioners can target data collection, adjust class weights, engineer more discriminative features, or choose alternative algorithms. Visualization tools like heatmaps make large confusion matrices easier to interpret at a glance, and the matrix remains a standard diagnostic output in virtually every classification workflow across computer vision, natural language processing, and tabular data modeling.