Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Capability Control

Capability Control

Mechanisms that constrain AI systems to prevent unintended or harmful actions.

Year: 2014Generality: 650
Back to Vocab

Capability control refers to the set of technical and governance strategies designed to limit what an AI system can do, ensuring it operates within boundaries that are safe and aligned with human intentions. Rather than relying solely on an AI's values or objectives being correctly specified, capability control takes a more direct approach: restricting the system's access to resources, actions, or information so that even a misaligned system cannot cause catastrophic harm. This makes it a foundational concept in AI safety, complementing alignment research by providing a layer of defense that does not depend on the AI behaving as intended.

In practice, capability control encompasses a range of techniques. These include boxing—isolating an AI system from external networks or physical actuators—as well as tripwires and monitoring systems that detect anomalous behavior, output filters that block harmful content or actions, and resource limitations that prevent an AI from acquiring computational power or influence beyond what its task requires. Formal methods such as constraint satisfaction and verified sandboxing are also explored as more rigorous implementations. The underlying logic is that a system with limited capabilities has a limited blast radius, even if its goals or reasoning are subtly wrong.

Capability control became a central topic in AI safety discourse largely through the work of researchers at institutions like the Future of Humanity Institute and the Machine Intelligence Research Institute, and was given systematic treatment in Nick Bostrom's 2014 book Superintelligence. The concept gained practical urgency as large language models and autonomous agents demonstrated increasingly broad and transferable capabilities, making the question of what an AI can do as important as what it wants to do. Policymakers and AI developers have since incorporated capability control thinking into deployment frameworks, red-teaming protocols, and regulatory proposals.

Despite its appeal, capability control faces significant challenges. Sufficiently capable systems may find unexpected pathways around restrictions, a concern sometimes called the containment problem. Critics also note that overly restrictive controls can reduce the utility of AI systems, creating pressure to relax safeguards over time. For these reasons, most safety researchers treat capability control not as a standalone solution but as one layer in a broader defense-in-depth strategy that also includes alignment, interpretability, and robust oversight.

Related

Related

Control Problem
Control Problem

The challenge of ensuring advanced AI systems reliably act in accordance with human values.

Generality: 752
Capability Elucidation
Capability Elucidation

Systematic methods to reveal what tasks and latent abilities an AI system possesses.

Generality: 493
Guardrails
Guardrails

Technical and policy constraints ensuring AI systems behave safely and ethically.

Generality: 694
Capability Overhang
Capability Overhang

Latent AI capabilities that exist but remain unrealized until unlocked by new techniques.

Generality: 337
God in a Box
God in a Box

A hypothetical superintelligent AI confined within strict controls to prevent catastrophic misuse.

Generality: 108
Catastrophic Risk
Catastrophic Risk

The potential for AI systems to cause severe, large-scale harm or societal disruption.

Generality: 745