A classification metric summarizing precision-recall trade-offs across decision thresholds.
AUCPR, or Area Under the Precision-Recall Curve, is a scalar metric that summarizes a classifier's performance by integrating the precision-recall curve across all possible decision thresholds. Precision measures the fraction of positive predictions that are truly positive, while recall measures the fraction of actual positives that the model correctly identifies. As the decision threshold shifts, these two quantities trade off against each other, tracing a curve in precision-recall space. The AUCPR collapses this curve into a single number between 0 and 1, where higher values indicate a model that maintains both high precision and high recall simultaneously.
Computing AUCPR typically involves calculating precision and recall at many threshold values and then approximating the area under the resulting curve using numerical integration methods such as the trapezoidal rule or interpolation. A key baseline for comparison is a random classifier, whose AUCPR equals the proportion of positive examples in the dataset — making the metric sensitive to class distribution in a way that directly reflects real-world difficulty. This stands in contrast to AUC-ROC, which compares true positive rates to false positive rates and can remain artificially high even when a model performs poorly on the minority class.
AUCPR is especially valuable in domains characterized by severe class imbalance, such as fraud detection, rare disease diagnosis, information retrieval, and anomaly detection. In these settings, the positive class is rare, and a model that simply predicts the majority class achieves high accuracy and even reasonable AUC-ROC scores while failing to detect any true positives. AUCPR exposes this failure directly, making it a more honest and actionable metric for practitioners who care about correctly identifying rare but important events.
The metric became widely adopted in the machine learning community during the early 2000s as researchers formalized the limitations of accuracy and AUC-ROC on imbalanced benchmarks. It is now a standard evaluation tool in applied machine learning pipelines, particularly in medical informatics, cybersecurity, and natural language processing tasks like named entity recognition, where positive examples are structurally underrepresented.