Methods that make AI decision-making transparent and interpretable to humans.
Explainable AI (XAI) refers to a collection of techniques, frameworks, and design principles aimed at making the internal reasoning of AI systems understandable to human users. As machine learning models — particularly deep neural networks — grew more powerful, they also became increasingly opaque, producing accurate outputs through processes that even their creators could not easily interpret. XAI emerged as a response to this "black box" problem, seeking to bridge the gap between model performance and human comprehension by surfacing the factors, features, and logic that drive a model's predictions.
XAI methods operate at different levels of granularity and use a variety of approaches. Local explanation methods like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) explain individual predictions by identifying which input features contributed most to a specific output. Global methods instead aim to characterize the overall behavior of a model — for example, by approximating a complex model with a simpler, inherently interpretable one like a decision tree. Attention mechanisms in neural networks are sometimes treated as a form of built-in explainability, though their reliability as true explanations remains debated. Saliency maps and gradient-based techniques are commonly used in computer vision to highlight which regions of an image influenced a classification decision.
The importance of XAI extends well beyond technical curiosity. In high-stakes domains such as healthcare, criminal justice, and financial lending, opaque AI decisions can have serious consequences for individuals and institutions. Regulators in the European Union, through frameworks like GDPR and the AI Act, have begun mandating that automated decisions affecting individuals be explainable and contestable. XAI thus sits at the intersection of technical AI research and broader concerns about accountability, fairness, and trust.
DARPA's XAI program, launched in 2016, significantly accelerated research in this area by funding systematic efforts to develop explainable models without sacrificing performance. Since then, XAI has grown into a major subfield, with dedicated conferences, benchmarks, and an expanding toolkit of methods. The central tension — between model complexity and interpretability — remains an active area of research, as practitioners seek explanations that are not only accurate but also meaningful and actionable for the humans who rely on them.