Explanations showing which input changes would have produced a different model output.
Counterfactual explanations are a technique in explainable AI (XAI) that answer the question: "What would need to change for this decision to be different?" Rather than explaining why a model made a particular prediction, they identify the minimal or most actionable modifications to an input that would flip the output. For example, a loan applicant denied credit might receive a counterfactual explanation stating: "If your annual income were $8,000 higher and your outstanding debt $2,000 lower, your application would have been approved." This framing makes model behavior concrete and actionable in a way that feature importance scores or saliency maps often cannot.
Generating counterfactual explanations typically involves an optimization problem: find an input point close to the original that crosses a decision boundary into the desired output class. Proximity is usually measured in feature space, with constraints to ensure the suggested changes are realistic and actionable — for instance, age cannot be decreased, and immutable demographic attributes are often excluded. Methods range from gradient-based search and genetic algorithms to model-agnostic sampling approaches. A key challenge is balancing closeness to the original input against the plausibility and diversity of the suggested alternatives, since a single counterfactual may not capture the full range of ways a decision could change.
Counterfactual explanations have become central to the explainable AI movement because they align naturally with how humans reason about causation and responsibility. They are particularly valuable in high-stakes regulated domains — credit scoring, hiring, medical diagnosis, and criminal justice — where individuals have a legitimate interest in understanding and potentially contesting automated decisions. The EU's GDPR, with its "right to explanation" provisions, has further accelerated adoption. Beyond individual recourse, counterfactuals can also surface systemic model biases: if the changes required to flip a decision are systematically harder for certain demographic groups, that asymmetry reveals a fairness problem the model's aggregate metrics might obscure.