Understanding complex systems by decomposing them into simpler, interacting components.
Compositional reasoning is the capacity of an AI system to understand, generate, or solve complex problems by decomposing them into simpler constituent parts and modeling the relationships between those parts. Rather than treating a problem as a monolithic whole, a compositionally reasoning system builds up meaning or solutions from smaller, well-understood units — much the way a sentence's meaning emerges from words and grammar, or a scene's interpretation emerges from individual objects and their spatial relationships. This principle, rooted in the linguistic notion of compositionality (that the meaning of a whole is a function of its parts), has become a central benchmark for evaluating how deeply AI systems actually understand structure versus memorizing surface patterns.
In practice, compositional reasoning appears across many AI subfields. In natural language processing, it underlies tasks like semantic parsing, question answering, and multi-step inference, where a model must chain together facts or logical steps rather than retrieve a single cached answer. In computer vision, it manifests in scene graph generation and visual question answering, where models must identify objects, attributes, and relations simultaneously. In reinforcement learning, compositional approaches allow agents to reuse learned sub-skills to solve novel task combinations. The challenge in all these domains is the same: can a model trained on a finite set of examples generalize to new combinations it has never seen?
Despite impressive progress, compositional reasoning remains a significant weakness of large neural models. Studies have repeatedly shown that transformers and other deep learning architectures can fail on systematically novel compositions even when they master individual components — a gap sometimes called the "compositional generalization" problem. Benchmarks like SCAN, COGS, and gSCAN were specifically designed to expose this limitation by testing models on held-out combinations of known primitives.
Addressing this gap has driven research into neuro-symbolic architectures, modular networks, and structured inductive biases that explicitly encode part-whole relationships. These approaches aim to combine the pattern-recognition strengths of neural networks with the systematic, rule-governed generalization that symbolic systems provide. Compositional reasoning is therefore not just a capability to measure but a design target — one that many researchers consider essential for achieving robust, human-like AI generalization.