Systematic errors in data or algorithms that produce unfair or skewed outcomes.
In machine learning, bias refers to systematic errors that cause a model to produce skewed, inaccurate, or unfair outputs. It manifests in several distinct but related forms: data bias, algorithmic bias, and societal bias. Data bias occurs when training datasets fail to accurately represent the target population — for example, a facial recognition system trained predominantly on light-skinned faces will perform poorly on darker-skinned individuals. Algorithmic bias emerges when modeling choices, objective functions, or feature engineering inadvertently encode or amplify existing prejudices, even when no discriminatory intent is present.
In a purely technical sense, bias also has a precise statistical meaning: it is one half of the bias-variance tradeoff, describing the error introduced when a model makes overly simplistic assumptions about the data-generating process. A high-bias model underfits the training data, failing to capture meaningful patterns. This statistical definition and the fairness-related definition are conceptually linked — both describe systematic, non-random errors — but they operate at different levels of abstraction and concern different stakeholders.
The societal dimension of bias has become increasingly critical as AI systems are deployed in high-stakes domains such as hiring, credit scoring, medical diagnosis, and criminal justice. When models trained on historically biased data are used to make consequential decisions, they risk perpetuating and even amplifying existing inequalities. Research by scholars like Joy Buolamwini and Timnit Gebru demonstrated measurable disparities in commercial AI systems, galvanizing the field of algorithmic fairness and prompting regulatory attention worldwide.
Addressing bias requires intervention at multiple stages of the machine learning pipeline. Practitioners can audit training data for representation gaps, apply fairness constraints during model training, use post-processing techniques to equalize outcomes across groups, and conduct ongoing monitoring after deployment. No single mitigation strategy eliminates bias entirely, and different fairness criteria — such as demographic parity, equalized odds, or individual fairness — can be mathematically incompatible with one another. This makes bias one of the most technically and ethically complex challenges in modern AI development.