Manipulating AI language models by embedding malicious instructions within input prompts.
Prompt injection is a class of adversarial attack targeting large language models (LLMs) and other instruction-following AI systems, in which an attacker embeds hidden or conflicting directives within an input prompt to override the model's intended behavior. Because modern LLMs are trained to follow natural language instructions, they often cannot reliably distinguish between legitimate user commands and malicious instructions smuggled inside seemingly benign text. This makes prompt injection fundamentally different from traditional software exploits — there is no code execution vulnerability to patch, only a model that is, by design, responsive to language.
Attacks take several forms. Direct prompt injection occurs when a user deliberately crafts their own input to bypass safety guidelines, extract system prompts, or coerce the model into producing restricted content. Indirect prompt injection is more insidious: malicious instructions are hidden in external content the model retrieves or processes — a webpage, a document, or a database record — causing the model to act on attacker-controlled commands without the user's knowledge. As LLMs are increasingly deployed as autonomous agents with access to tools, APIs, and sensitive data, indirect injection poses serious security risks, potentially enabling data exfiltration, unauthorized actions, or manipulation of downstream systems.
Defending against prompt injection is an open and difficult problem. Proposed mitigations include input sanitization, privilege separation between user and system instructions, fine-tuning models to be more robust to adversarial prompts, and architectural approaches that treat retrieved content as untrusted data. However, no solution has proven fully effective, partly because the same generalization ability that makes LLMs useful also makes them susceptible to novel injection patterns they have not been trained to resist.
Prompt injection matters well beyond academic security research. As organizations deploy LLM-powered applications in customer service, coding assistance, healthcare, and enterprise automation, the attack surface grows substantially. Understanding and mitigating prompt injection is now considered a core concern in responsible AI deployment, and it has prompted dedicated research tracks, red-teaming practices, and emerging regulatory guidance around AI system security.