Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Model Collapse (Silent Collapse)

Model Collapse (Silent Collapse)

Progressive AI degradation caused by recursive training on AI-generated synthetic data.

Year: 2023Generality: 339
Back to Vocab

Model collapse, sometimes called silent collapse, is a failure mode in which generative AI systems—particularly large language models—degrade in quality when trained iteratively on data produced by other AI models rather than authentic human-generated content. Because modern AI systems increasingly scrape the open web for training data, and because that web is rapidly filling with AI-generated text and images, the risk of inadvertently training on synthetic outputs has grown substantially. The result is a feedback loop in which each successive generation of models inherits and amplifies the distortions of its predecessors.

The mechanism behind model collapse is rooted in statistical drift. When a model generates synthetic data, it approximates the true underlying distribution of its training set but inevitably introduces small errors—overrepresenting common patterns and underrepresenting rare or complex ones. When that synthetic data becomes the basis for the next round of training, those approximation errors compound. Rare linguistic constructions, minority viewpoints, and nuanced factual relationships are progressively squeezed out, while the model's outputs converge toward a narrower, blander, and often less accurate representation of reality. Crucially, this degradation can be subtle in early iterations, making it difficult to detect before significant damage has accumulated.

Research published in 2023 by Ilia Shumailov and colleagues at the University of Oxford provided empirical grounding for these concerns, demonstrating measurable performance decay after only a few cycles of recursive synthetic training. Their work showed that even modest proportions of AI-generated data in a training corpus could accelerate collapse, with the model eventually producing incoherent or heavily biased outputs. The "silent" descriptor reflects how the degradation often evades standard benchmarks initially, only becoming apparent in edge cases or low-frequency tasks.

Model collapse has significant practical implications for the AI industry. As the volume of AI-generated content on the internet grows, maintaining access to high-quality, human-authored training data becomes both more valuable and more logistically challenging. Proposed mitigations include watermarking synthetic content to enable its exclusion from future training pipelines, curating datasets with strict provenance tracking, and developing evaluation frameworks sensitive enough to catch early-stage collapse before it propagates across model generations.

Related

Related

Model Collapse
Model Collapse

When generative models lose output diversity, repeatedly producing identical or near-identical results.

Generality: 602
Mode Collapse
Mode Collapse

When a GAN generator produces repetitive, low-diversity outputs instead of capturing full data distribution.

Generality: 602
Hallucination
Hallucination

When AI models confidently generate plausible but factually incorrect or fabricated outputs.

Generality: 794
Context Rot
Context Rot

Gradual degradation of an AI system's context, producing stale or contradictory outputs over time.

Generality: 107
Reasoning Instability
Reasoning Instability

When AI models produce inconsistent or contradictory reasoning across similar inputs.

Generality: 395
Mirage Effect
Mirage Effect

When multimodal AI models produce confident visual analysis from images that were never provided

Generality: 542