AI systems unfairly favoring certain demographic groups due to biased training data.
In-group bias in machine learning refers to the tendency of AI systems to produce outputs that systematically favor one demographic, social, or cultural group over others. This phenomenon typically emerges when training data reflects existing societal inequalities — if historical records, text corpora, or labeled datasets over-represent certain groups or encode preferential treatment toward them, models trained on that data will learn and perpetuate those patterns. The result is a system that performs better, assigns higher scores, or makes more favorable decisions for members of the dominant group, while disadvantaging those in out-groups.
The mechanism behind in-group bias is closely tied to how models learn statistical associations. A language model trained predominantly on text authored by or about a particular demographic may develop stronger, more accurate representations for that group. Similarly, a facial recognition system trained mostly on lighter-skinned faces will generalize poorly to darker-skinned individuals. Because these disparities are embedded in the learned weights rather than explicit rules, they can be difficult to detect without targeted auditing. Bias can also be amplified through feedback loops, where biased model outputs influence future data collection, reinforcing the original skew.
Addressing in-group bias has become a central concern in the field of algorithmic fairness. Mitigation strategies operate at multiple stages of the ML pipeline: at the data level through resampling, reweighting, or curating more representative datasets; at the model level through fairness-aware training objectives and regularization; and at the output level through post-processing techniques that calibrate predictions across groups. Evaluation requires disaggregated metrics — measuring performance separately across demographic subgroups rather than relying on aggregate accuracy, which can mask significant disparities.
The practical stakes are high. AI systems exhibiting in-group bias have been documented in consequential domains including hiring, credit scoring, criminal risk assessment, and medical diagnosis, where biased outputs can cause real harm to already marginalized populations. This has driven regulatory interest and spurred the development of formal fairness criteria — such as demographic parity, equalized odds, and individual fairness — that provide rigorous frameworks for defining and measuring what it means for a model to treat groups equitably.