Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Reporting Bias

Reporting Bias

A systematic distortion in training data caused by selective omission of outcomes or observations.

Year: 2010Generality: 694
Back to Vocab

Reporting bias occurs when the data available for training or evaluating an AI model fails to represent reality because certain outcomes, events, or observations are systematically more likely to be recorded, published, or shared than others. Unlike random noise, this distortion is structured — positive results get documented while negative ones are quietly discarded, dramatic events make headlines while mundane ones go unlogged, and socially sensitive information gets suppressed by institutional or cultural pressures. The result is a dataset that reflects not the world as it is, but the world as it has been selectively described.

In machine learning, reporting bias is particularly insidious because models learn statistical patterns from whatever data they are given, with no inherent mechanism to detect what is missing. A sentiment classifier trained on published product reviews will never see the reviews that were flagged and removed. A medical diagnosis model trained on clinical records will underrepresent patients who never sought care. A language model trained on internet text will absorb the implicit assumptions of who writes online and what they choose to say. In each case, the model internalizes the bias as if it were ground truth, producing predictions that systematically fail for underrepresented groups or scenarios.

Reporting bias is closely related to, but distinct from, other forms of dataset bias. Selection bias refers to how samples are chosen; reporting bias specifically concerns which outcomes or attributes within those samples get faithfully recorded. Publication bias — the tendency for journals and repositories to favor statistically significant or positive findings — is one well-studied instance of reporting bias that directly affects benchmark datasets and scientific claims about model performance.

Addressing reporting bias requires both technical and organizational interventions. Auditing datasets for systematic gaps, collecting data through multiple independent channels, and applying reweighting or imputation techniques can partially compensate for known omissions. More fundamentally, building awareness of what incentives shape data collection — commercial, institutional, or social — is essential for practitioners who want their models to generalize reliably beyond the narrow slice of reality their training data actually captures.

Related

Related

Coverage Bias
Coverage Bias

A dataset imbalance where underrepresented groups cause skewed model performance.

Generality: 520
Sampling Bias
Sampling Bias

A data flaw where training samples misrepresent the true population, distorting model behavior.

Generality: 794
Participation Bias
Participation Bias

A dataset imbalance where certain groups are over- or underrepresented, skewing model outcomes.

Generality: 524
Bias
Bias

Systematic errors in data or algorithms that produce unfair or skewed outcomes.

Generality: 854
Non-Response Bias
Non-Response Bias

Skew introduced when survey non-respondents differ systematically from respondents.

Generality: 383
Historical Bias
Historical Bias

Bias in AI systems inherited from prejudiced or unrepresentative historical training data.

Generality: 626