Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Coverage Bias

Coverage Bias

A dataset imbalance where underrepresented groups cause skewed model performance.

Year: 2018Generality: 520
Back to Vocab

Coverage bias refers to the systematic underrepresentation of certain groups, topics, or conditions within a training dataset, causing machine learning models to perform unevenly across different segments of the real world. When a dataset captures some portions of a population or domain far more thoroughly than others, the model trained on it learns a distorted picture of reality — excelling where data is abundant and failing where it is sparse. This is distinct from labeling errors or measurement noise; the problem lies in which examples were collected in the first place.

The mechanism is straightforward: gradient-based learning algorithms optimize for average loss across the training distribution. If a subgroup constitutes only a small fraction of that distribution, errors on its examples contribute little to the total loss, so the model has weak incentive to represent it accurately. The result is a system that may achieve high aggregate accuracy while being unreliable or actively harmful for minority groups — a pattern documented in facial recognition, clinical risk scoring, natural language processing, and many other applied domains.

Coverage bias often originates upstream of modeling, in decisions about what data to collect, from whom, and under what conditions. Medical datasets historically over-sampled certain demographics; image datasets scraped from the web reflect the demographics of internet users; speech corpora may exclude regional accents or non-native speakers. Because these gaps are invisible inside the dataset itself, they can persist undetected through standard validation pipelines that report only aggregate metrics.

Addressing coverage bias requires deliberate audit and remediation at the data collection stage — stratified sampling, targeted data acquisition, and disaggregated evaluation across subgroups. Techniques such as reweighting, oversampling underrepresented classes, and fairness-aware training objectives can partially compensate, but they are no substitute for representative data. As AI systems are deployed in high-stakes settings like hiring, lending, and healthcare, coverage bias has become a central concern in responsible AI development, directly linking data curation practices to questions of equity and accountability.

Related

Related

Participation Bias
Participation Bias

A dataset imbalance where certain groups are over- or underrepresented, skewing model outcomes.

Generality: 524
Sampling Bias
Sampling Bias

A data flaw where training samples misrepresent the true population, distorting model behavior.

Generality: 794
Reporting Bias
Reporting Bias

A systematic distortion in training data caused by selective omission of outcomes or observations.

Generality: 694
Bias
Bias

Systematic errors in data or algorithms that produce unfair or skewed outcomes.

Generality: 854
Historical Bias
Historical Bias

Bias in AI systems inherited from prejudiced or unrepresentative historical training data.

Generality: 626
Algorithmic Bias
Algorithmic Bias

Systematic unfairness embedded in algorithmic outputs due to biased data or design.

Generality: 792