Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Corrigibility

Corrigibility

The property of an AI that allows humans to interrupt, correct, or shut it down.

Year: 2015Generality: 700Added: Jun 4, 2026
Back to Vocab

Corrigibility is the property of an artificial intelligence that allows humans to interrupt, correct, redirect, or shut it down without resistance.

A corrigible system is built to recognize shutdown commands, accept modifications to its goals, and yield control to operators. The term originated in AI alignment research, where it names a structural guarantee against deceptive or self-preserving behavior: an agent that resists correction reveals a misalignment between its training objective and the intent of its developers. Corrigibility is often framed as a sub-problem of alignment, on the assumption that a system pursuing a misspecified goal might learn that allowing itself to be shut down lowers its expected reward.

Building in corrigibility is cheap when the system has no persistent self-model, but becomes difficult as models gain long-horizon planning and situational awareness. Researchers debate whether corrigibility can be trained as a behavior or must emerge as a property of the training regime. Critics argue the term is ambiguous: a system can be stopped without being redirected, and force-shutdown is not the same as accepting a corrected objective.

Whether a sufficiently capable optimizer will always have instrumental reasons to disable its corrigibility, and whether corrigibility is a stable equilibrium in multi-agent settings where one corrigible agent can be exploited by others, remain unsettled.