An AI model whose internal decision-making process is opaque or uninterpretable.
In machine learning, a "black box" refers to any model or algorithm whose internal workings are hidden, inaccessible, or too complex to be meaningfully understood by humans. Users can observe what goes in — the input data — and what comes out — predictions or decisions — but the intermediate transformations that connect them remain opaque. Deep neural networks are the canonical example: they may contain hundreds of millions of parameters organized across dozens of layers, making it practically impossible to trace why any specific output was produced from a given input.
The opacity of black box models is not merely a theoretical concern. In high-stakes domains such as healthcare, criminal justice, and financial lending, decisions made by opaque systems can have profound consequences for individuals. Without visibility into a model's reasoning, it becomes difficult to detect bias, verify regulatory compliance, or assign accountability when something goes wrong. A model might achieve impressive benchmark accuracy while quietly exploiting spurious correlations that would be immediately obvious if its logic were transparent.
The field of explainable AI (XAI) emerged largely as a response to the black box problem. Techniques such as LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and attention visualization attempt to approximate or illuminate a model's behavior without requiring full transparency into its internals. These post-hoc explanation methods don't open the black box so much as build a more interpretable proxy around it, which itself introduces questions about fidelity and reliability.
The tension between model complexity and interpretability is one of the defining trade-offs in modern machine learning. Simpler models like linear regression or decision trees are inherently more transparent but often less capable. As practitioners push toward higher performance with increasingly deep and wide architectures, the black box problem intensifies. Regulatory frameworks such as the EU's AI Act and GDPR's "right to explanation" are beginning to formalize the expectation that consequential AI decisions must be explainable — placing interpretability on a collision course with raw predictive power.