Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Prompt Injection

Prompt Injection

Manipulating AI language models by embedding malicious instructions within input prompts.

Year: 2022Generality: 499
Back to Vocab

Prompt injection is a class of adversarial attack targeting large language models (LLMs) and other instruction-following AI systems, in which an attacker embeds hidden or conflicting directives within an input prompt to override the model's intended behavior. Because modern LLMs are trained to follow natural language instructions, they often cannot reliably distinguish between legitimate user commands and malicious instructions smuggled inside seemingly benign text. This makes prompt injection fundamentally different from traditional software exploits — there is no code execution vulnerability to patch, only a model that is, by design, responsive to language.

Attacks take several forms. Direct prompt injection occurs when a user deliberately crafts their own input to bypass safety guidelines, extract system prompts, or coerce the model into producing restricted content. Indirect prompt injection is more insidious: malicious instructions are hidden in external content the model retrieves or processes — a webpage, a document, or a database record — causing the model to act on attacker-controlled commands without the user's knowledge. As LLMs are increasingly deployed as autonomous agents with access to tools, APIs, and sensitive data, indirect injection poses serious security risks, potentially enabling data exfiltration, unauthorized actions, or manipulation of downstream systems.

Defending against prompt injection is an open and difficult problem. Proposed mitigations include input sanitization, privilege separation between user and system instructions, fine-tuning models to be more robust to adversarial prompts, and architectural approaches that treat retrieved content as untrusted data. However, no solution has proven fully effective, partly because the same generalization ability that makes LLMs useful also makes them susceptible to novel injection patterns they have not been trained to resist.

Prompt injection matters well beyond academic security research. As organizations deploy LLM-powered applications in customer service, coding assistance, healthcare, and enterprise automation, the attack surface grows substantially. Understanding and mitigating prompt injection is now considered a core concern in responsible AI deployment, and it has prompted dedicated research tracks, red-teaming practices, and emerging regulatory guidance around AI system security.

Related

Related

Prompt Engineering
Prompt Engineering

Crafting input text strategically to elicit desired outputs from AI language models.

Generality: 694
Prompt
Prompt

A text input given to a language model to elicit a desired response.

Generality: 796
System Prompt
System Prompt

Hidden instructions given to a language model that shape its behavior and persona.

Generality: 620
System Prompt Learning
System Prompt Learning

Automatically optimizing persistent model instructions to steer behavior without full retraining.

Generality: 520
Super Prompting
Super Prompting

Crafting highly specific input prompts to steer AI models toward desired outputs.

Generality: 450
Underprompting
Underprompting

Providing insufficient context or instruction in a prompt, degrading AI output quality.

Generality: 293