Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Accuracy

Accuracy

The fraction of correct predictions a classification model makes overall.

Year: 1990Generality: 875
Back to Vocab

Accuracy is a fundamental evaluation metric for classification models, defined as the number of correct predictions divided by the total number of predictions made. It answers a simple question: out of all the examples the model was asked to classify, what proportion did it get right? Expressed as a percentage or a value between 0 and 1, accuracy is intuitive and easy to communicate, making it one of the most commonly reported metrics when comparing models or reporting results to non-technical audiences.

Under the hood, accuracy aggregates true positives and true negatives in the numerator and places all predictions in the denominator. For binary classification, this means correctly identified positive cases plus correctly identified negative cases, divided by the total sample size. For multiclass problems, the logic extends naturally: any prediction where the model's output matches the ground-truth label counts as correct. The metric is computed after inference and requires labeled ground-truth data, making it a supervised evaluation tool.

Despite its simplicity, accuracy can be deeply misleading when class distributions are imbalanced. A model trained on a medical dataset where 95% of patients are healthy can achieve 95% accuracy by predicting "healthy" for every patient — while completely failing to detect any disease. This limitation has driven practitioners toward complementary metrics such as precision, recall, F1-score, and the area under the ROC curve, which expose how a model performs on each class individually rather than collapsing everything into a single number.

Accuracy remains most reliable when classes are roughly balanced and when the costs of different error types are approximately equal. In competitive machine learning benchmarks — such as ImageNet for image classification — accuracy on a held-out test set became the standard leaderboard metric during the deep learning era of the 2010s, driving rapid architectural progress. Understanding when accuracy is and is not an appropriate metric is considered a foundational skill in applied machine learning, and responsible model evaluation almost always involves reporting it alongside other complementary measures.

Related

Related

Prediction Error
Prediction Error

The gap between a model's predicted values and the actual observed outcomes.

Generality: 875
Average Precision
Average Precision

A single-score metric summarizing model performance across all precision-recall thresholds.

Generality: 700
Eval (Evaluation)
Eval (Evaluation)

Measuring an AI model's performance against defined metrics and datasets.

Generality: 838
Validation Metric
Validation Metric

A quantitative measure used to evaluate model performance on held-out data.

Generality: 780
Confusion Matrix
Confusion Matrix

A table that breaks down a classifier's predictions against actual class labels.

Generality: 796
Precision-Recall Curve
Precision-Recall Curve

A plot evaluating classifier performance by trading off precision against recall across thresholds.

Generality: 729