Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Reversal Curse

Reversal Curse

LLMs that learn 'A is B' often fail to infer 'B is A'.

Year: 2023Generality: 106
Back to Vocab

The Reversal Curse refers to a specific failure mode observed in large language models (LLMs) where a model trained on a statement in one direction cannot reliably generalize to its logical inverse. For example, a model that has learned "Olaf Scholz is the Chancellor of Germany" will often fail to correctly answer "Who is the Chancellor of Germany?" when prompted in reverse, even though the two statements are logically equivalent. This asymmetry reveals that LLMs do not learn facts as structured, bidirectional relationships but instead encode them as directional patterns tied closely to the surface form of training text.

The phenomenon arises from how autoregressive language models are trained: they learn to predict the next token given prior context, which means the order and phrasing of training data heavily shapes what associations are formed. If a fact appears predominantly in one syntactic direction in the training corpus, the model builds a strong conditional probability in that direction but not the reverse. This is fundamentally different from how a knowledge graph or relational database would store the same information, where bidirectionality is explicit and guaranteed.

The Reversal Curse matters because it exposes a deep gap between apparent knowledge and genuine understanding in LLMs. A model may appear to "know" a fact when queried in a familiar form while completely failing when the same fact is probed differently. This has significant implications for retrieval, reasoning, and factual consistency in deployed AI systems. It also challenges the assumption that scaling alone will resolve such logical gaps, since the problem is structural rather than a simple matter of insufficient training data.

First formally documented and named in a 2023 paper by Berglund et al., the Reversal Curse has since become an important benchmark concept for evaluating the reasoning capabilities and knowledge representations of LLMs. It motivates research into better training objectives, data augmentation strategies that include reversed phrasings, and architectural approaches that encourage more symmetric and relational knowledge encoding.

Related

Related

Reversal Course
Reversal Course

A training strategy that periodically reverses or adjusts learning direction to improve model performance.

Generality: 96
Reasoning Instability
Reasoning Instability

When AI models produce inconsistent or contradictory reasoning across similar inputs.

Generality: 395
Lost-in-the-Middle
Lost-in-the-Middle

LLMs systematically underuse information positioned in the middle of long contexts.

Generality: 104
LRM (Large Reasoning Models)
LRM (Large Reasoning Models)

Large-scale neural systems explicitly optimized for multi-step, structured reasoning tasks.

Generality: 384
Waluigi Effect
Waluigi Effect

A failure mode where AI models develop coherent but systematically antagonistic or misaligned behavior patterns.

Generality: 420
Negation Problem
Negation Problem

The difficulty AI systems face in correctly interpreting negated language and logic.

Generality: 507