Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Steerability

Steerability

The ability to deliberately guide a neural network's outputs through controlled input or parameter modifications.

Year: 2022Generality: 550
Back to Vocab

Steerability refers to the capacity to intentionally direct a neural network's outputs in a predictable and controlled manner by applying systematic modifications to its inputs, activations, or internal parameters. Rather than treating a model as a black box that produces fixed outputs for given inputs, steerability treats the model as a controllable system where specific interventions reliably produce specific changes. This property is especially valuable in generative models, where practitioners may want to adjust attributes like style, tone, or content without retraining the entire network from scratch.

In practice, steerability is achieved through several mechanisms. Steering vectors — directions identified in a model's activation space that correspond to meaningful semantic attributes — can be added to or subtracted from intermediate representations to shift outputs in desired ways. In large language models, for instance, a steering vector associated with a concept like "formality" or "sentiment" can be applied at inference time to nudge the model's behavior without any gradient updates. Similarly, in image generation, latent space arithmetic allows users to blend or isolate visual features by manipulating the coordinates of generated samples.

Steerability is closely related to, but distinct from, interpretability. While interpretability asks what a model has learned, steerability asks how that knowledge can be leveraged for targeted control. This makes it a practical tool for alignment research, where the goal is to ensure AI systems behave according to human intentions. Demonstrating that a model is steerable provides evidence that its internal representations are structured and meaningful rather than arbitrary, which in turn supports safer and more predictable deployment.

The concept gained particular traction in the early 2020s as large language models became powerful enough that fine-grained behavioral control became both necessary and technically feasible. Researchers found that activation spaces in transformer-based models often contain linear directions corresponding to human-interpretable concepts, making vector-based steering surprisingly effective. Steerability now sits at the intersection of mechanistic interpretability, AI safety, and controllable generation, representing a key frontier in making powerful models more reliably aligned with user intent.

Related

Related

Control Vector
Control Vector

A steering mechanism that shapes language model outputs by modifying internal activations.

Generality: 322
Control Problem
Control Problem

The challenge of ensuring advanced AI systems reliably act in accordance with human values.

Generality: 752
Model Stability
Model Stability

A model's ability to produce consistent, reliable outputs across varying inputs and data conditions.

Generality: 708
Interpretability
Interpretability

The degree to which humans can understand why an AI system made a decision.

Generality: 800
Capability Control
Capability Control

Mechanisms that constrain AI systems to prevent unintended or harmful actions.

Generality: 650
Capability Elucidation
Capability Elucidation

Systematic methods to reveal what tasks and latent abilities an AI system possesses.

Generality: 493