Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Control Vector

Control Vector

A steering mechanism that shapes language model outputs by modifying internal activations.

Year: 2023Generality: 322
Back to Vocab

A control vector is a learned directional vector applied to the internal activations of a neural network—typically a large language model—to systematically shift its behavior along a specific axis, such as tone, formality, sentiment, or topic focus. Unlike prompting, which operates at the input level, control vectors intervene directly within the model's hidden states during inference, offering a more precise and consistent form of behavioral steering without altering the model's weights.

The technique works by identifying a direction in the model's activation space that corresponds to a target concept or behavioral trait. This direction is typically derived by contrasting activations produced by pairs of contrastive inputs—for example, prompts that are formal versus informal, or honest versus deceptive. The resulting vector captures the latent dimension along which the model internally represents that concept. At inference time, this vector is added to (or subtracted from) the residual stream at one or more layers, nudging the model's representations toward or away from the target behavior.

Control vectors occupy a practical middle ground between prompt engineering and full fine-tuning. Prompt engineering is flexible but can be inconsistent and is limited by what the model's context window can express. Fine-tuning offers deep behavioral change but requires significant compute and risks degrading general capabilities. Control vectors are lightweight, reusable, and composable—multiple vectors can be applied simultaneously with different scaling factors, enabling nuanced, multi-dimensional control over model outputs in real time.

The approach gained significant attention in the ML community around 2023, particularly following work on representation engineering and activation steering in transformer-based models. It has practical applications in AI safety (suppressing harmful outputs), personalization (adapting tone or style), and interpretability research (probing what concepts models internally represent). As language models grow larger and more capable, control vectors offer an efficient and interpretable mechanism for aligning model behavior with specific human preferences or deployment constraints.

Related

Related

Steerability
Steerability

The ability to deliberately guide a neural network's outputs through controlled input or parameter modifications.

Generality: 550
ControlNet
ControlNet

A neural network architecture that adds precise spatial controls to pretrained diffusion models.

Generality: 292
Capability Control
Capability Control

Mechanisms that constrain AI systems to prevent unintended or harmful actions.

Generality: 650
Control Problem
Control Problem

The challenge of ensuring advanced AI systems reliably act in accordance with human values.

Generality: 752
Word Vector
Word Vector

Dense numerical representations of words encoding semantic meaning and linguistic relationships.

Generality: 720
Contextual Embedding
Contextual Embedding

Word representations that dynamically shift meaning based on surrounding context.

Generality: 752