Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Criteria Drift

Criteria Drift

When evaluation metrics for a ML model shift over time, degrading measured performance.

Year: 2020Generality: 337
Back to Vocab

Criteria drift refers to the phenomenon in machine learning where the metrics, standards, or benchmarks used to evaluate a model's performance change over time, causing a model that once appeared effective to seem degraded — or vice versa — without any change to the model itself. This is distinct from concept drift, which involves changes in the underlying data distribution, or model decay, which reflects genuine performance degradation. Criteria drift is specifically about the goalposts moving: what counts as "good" performance is redefined, often in response to evolving business requirements, regulatory changes, or a deeper understanding of what the model should actually optimize for.

In practice, criteria drift can manifest in several ways. A fraud detection model might initially be evaluated on raw accuracy, but as stakeholders grow more sophisticated, they shift to precision-recall tradeoffs or F1 scores — suddenly the model looks worse on paper despite unchanged behavior. Similarly, a recommendation system once judged by click-through rate might later be assessed on user retention or satisfaction scores, reflecting a broader organizational shift in values. These changes in evaluation criteria can create confusion about whether a model needs retraining, replacement, or simply re-contextualization.

Criteria drift matters because it complicates model governance and MLOps pipelines. Automated monitoring systems that trigger retraining alerts based on metric thresholds can fire spuriously when the metrics themselves are redefined. It also creates challenges in longitudinal model comparison: benchmarking a new model against a predecessor becomes unreliable if the evaluation framework has shifted between deployments. Organizations practicing responsible AI development must maintain versioned records of evaluation criteria alongside model versions to ensure apples-to-apples comparisons.

Addressing criteria drift requires deliberate documentation practices, stakeholder alignment on evaluation frameworks before deployment, and periodic audits that distinguish between genuine model degradation and shifting standards. As machine learning systems are increasingly embedded in high-stakes domains like healthcare, finance, and criminal justice, the ability to detect and account for criteria drift becomes an important component of model accountability and long-term reliability.

Related

Related

Model Drift
Model Drift

When shifting real-world data patterns cause a deployed ML model's performance to degrade.

Generality: 694
Model Drift Minimization
Model Drift Minimization

Techniques that keep ML models accurate as real-world data distributions shift over time.

Generality: 694
Performance Degradation
Performance Degradation

The decline in an AI model's accuracy or reliability over time or under new conditions.

Generality: 702
Evaluation Overtime Function
Evaluation Overtime Function

A function measuring how model performance changes or degrades over extended time periods.

Generality: 293
Eval (Evaluation)
Eval (Evaluation)

Measuring an AI model's performance against defined metrics and datasets.

Generality: 838
Non-Stationary Objectives
Non-Stationary Objectives

An optimization target that shifts over time, turning learning into a continuous tracking problem.

Generality: 575