LLMs that learn 'A is B' often fail to infer 'B is A'.
The Reversal Curse refers to a specific failure mode observed in large language models (LLMs) where a model trained on a statement in one direction cannot reliably generalize to its logical inverse. For example, a model that has learned "Olaf Scholz is the Chancellor of Germany" will often fail to correctly answer "Who is the Chancellor of Germany?" when prompted in reverse, even though the two statements are logically equivalent. This asymmetry reveals that LLMs do not learn facts as structured, bidirectional relationships but instead encode them as directional patterns tied closely to the surface form of training text.
The phenomenon arises from how autoregressive language models are trained: they learn to predict the next token given prior context, which means the order and phrasing of training data heavily shapes what associations are formed. If a fact appears predominantly in one syntactic direction in the training corpus, the model builds a strong conditional probability in that direction but not the reverse. This is fundamentally different from how a knowledge graph or relational database would store the same information, where bidirectionality is explicit and guaranteed.
The Reversal Curse matters because it exposes a deep gap between apparent knowledge and genuine understanding in LLMs. A model may appear to "know" a fact when queried in a familiar form while completely failing when the same fact is probed differently. This has significant implications for retrieval, reasoning, and factual consistency in deployed AI systems. It also challenges the assumption that scaling alone will resolve such logical gaps, since the problem is structural rather than a simple matter of insufficient training data.
First formally documented and named in a 2023 paper by Berglund et al., the Reversal Curse has since become an important benchmark concept for evaluating the reasoning capabilities and knowledge representations of LLMs. It motivates research into better training objectives, data augmentation strategies that include reversed phrasings, and architectural approaches that encourage more symmetric and relational knowledge encoding.