A systematic distortion in training data caused by selective omission of outcomes or observations.
Reporting bias occurs when the data available for training or evaluating an AI model fails to represent reality because certain outcomes, events, or observations are systematically more likely to be recorded, published, or shared than others. Unlike random noise, this distortion is structured — positive results get documented while negative ones are quietly discarded, dramatic events make headlines while mundane ones go unlogged, and socially sensitive information gets suppressed by institutional or cultural pressures. The result is a dataset that reflects not the world as it is, but the world as it has been selectively described.
In machine learning, reporting bias is particularly insidious because models learn statistical patterns from whatever data they are given, with no inherent mechanism to detect what is missing. A sentiment classifier trained on published product reviews will never see the reviews that were flagged and removed. A medical diagnosis model trained on clinical records will underrepresent patients who never sought care. A language model trained on internet text will absorb the implicit assumptions of who writes online and what they choose to say. In each case, the model internalizes the bias as if it were ground truth, producing predictions that systematically fail for underrepresented groups or scenarios.
Reporting bias is closely related to, but distinct from, other forms of dataset bias. Selection bias refers to how samples are chosen; reporting bias specifically concerns which outcomes or attributes within those samples get faithfully recorded. Publication bias — the tendency for journals and repositories to favor statistically significant or positive findings — is one well-studied instance of reporting bias that directly affects benchmark datasets and scientific claims about model performance.
Addressing reporting bias requires both technical and organizational interventions. Auditing datasets for systematic gaps, collecting data through multiple independent channels, and applying reweighting or imputation techniques can partially compensate for known omissions. More fundamentally, building awareness of what incentives shape data collection — commercial, institutional, or social — is essential for practitioners who want their models to generalize reliably beyond the narrow slice of reality their training data actually captures.