The fraction of correct predictions a classification model makes overall.
Accuracy is a fundamental evaluation metric for classification models, defined as the number of correct predictions divided by the total number of predictions made. It answers a simple question: out of all the examples the model was asked to classify, what proportion did it get right? Expressed as a percentage or a value between 0 and 1, accuracy is intuitive and easy to communicate, making it one of the most commonly reported metrics when comparing models or reporting results to non-technical audiences.
Under the hood, accuracy aggregates true positives and true negatives in the numerator and places all predictions in the denominator. For binary classification, this means correctly identified positive cases plus correctly identified negative cases, divided by the total sample size. For multiclass problems, the logic extends naturally: any prediction where the model's output matches the ground-truth label counts as correct. The metric is computed after inference and requires labeled ground-truth data, making it a supervised evaluation tool.
Despite its simplicity, accuracy can be deeply misleading when class distributions are imbalanced. A model trained on a medical dataset where 95% of patients are healthy can achieve 95% accuracy by predicting "healthy" for every patient — while completely failing to detect any disease. This limitation has driven practitioners toward complementary metrics such as precision, recall, F1-score, and the area under the ROC curve, which expose how a model performs on each class individually rather than collapsing everything into a single number.
Accuracy remains most reliable when classes are roughly balanced and when the costs of different error types are approximately equal. In competitive machine learning benchmarks — such as ImageNet for image classification — accuracy on a held-out test set became the standard leaderboard metric during the deep learning era of the 2010s, driving rapid architectural progress. Understanding when accuracy is and is not an appropriate metric is considered a foundational skill in applied machine learning, and responsible model evaluation almost always involves reporting it alongside other complementary measures.