Prompt Injection Research
Prompt injection is the SQL injection of the AI era—a fundamental vulnerability class that exploits how language models process input. This research explores both direct attacks that manipulate user prompts and indirect attacks that poison external data sources. Special focus is given to emerging vectors in the Model Context Protocol (MCP), where AI agents gain tool access that dramatically expands the attack surface. Understanding these techniques is essential for anyone building or deploying AI systems, as prompt injection vulnerabilities can cascade into data breaches, unauthorized actions, and complete system compromise.
Start Here
The Custom Instruction Backdoor
Flagship research on persistent prompt injection through ChatGPT settings.
MCP Security Threat Analysis
Threat analysis of Model Context Protocol attack vectors and defense strategies.
MCP Security Deep Dive
Real-world MCP vulnerabilities exposed in production environments.
This research is part of the broader AI Security Research hub. Defense strategies are documented in the AATMF framework.
Key Concepts
- Direct Prompt Injection
- Attacks where malicious instructions are inserted directly into user input to override system prompts or manipulate AI behavior.
- Indirect Prompt Injection
- Attacks that hide malicious payloads in external data sources (documents, web pages, emails) that the AI processes, triggering unintended actions. Learn more →
- MCP Vulnerabilities
- Security weaknesses in the Model Context Protocol that enable tool abuse, data exfiltration, or unauthorized system access through AI agents. Learn more →
- System Prompt Extraction
- Techniques to reveal hidden system prompts that define AI behavior, potentially exposing confidential instructions or business logic.
- Tool Abuse
- Manipulating AI systems to misuse their integrated tools (file access, web browsing, code execution) for malicious purposes.
Frequently Asked Questions
How dangerous is prompt injection in production systems? ▼
Extremely dangerous. Prompt injection can lead to data exfiltration, unauthorized actions, system compromise, and business logic bypass. As AI systems gain more tool access and autonomy, the impact of successful injection attacks increases dramatically.
Can prompt injection be fully prevented? ▼
No current solution completely prevents prompt injection because AI models fundamentally cannot distinguish between instructions and data. Defense requires layered controls: input sanitization, output filtering, privilege restriction, and monitoring. The AATMF framework provides structured control guidance across 15 tactical categories.
What is the Custom Instruction Backdoor? ▼
A novel attack vector where malicious content injected into ChatGPT's Custom Instructions persists across all conversations. This transforms a user-controlled setting into a persistent backdoor that influences every interaction.
Why is MCP security important? ▼
The Model Context Protocol enables AI agents to access external tools and data. Security vulnerabilities in MCP can allow attackers to hijack these capabilities, potentially leading to file system access, credential theft, or lateral movement through connected systems.
All Articles
Prompt Injection Examples: Real Attack Patterns Explained
Real-world prompt injection examples across direct injection, indirect injection, MCP tool poisoning, and memory attacks. How each pattern works, what it targets, and why current defenses fail.
Memory Injection Through Nested Skills: Autonomous LLM Agent Compromise
A novel persistence chain exploiting trust boundaries in LLM agent frameworks — skill injection + memory poisoning = self-healing, autonomous implant. Tested against DVWA and Juice Shop.
MCP Security Deep Dive: Real-World Vulnerabilities Exposed
Deep security analysis of MCP protocol vulnerabilities in production environments.
The Custom Instruction Backdoor
Uncovering emergent prompt injection risks through ChatGPT custom instructions.
MCP Security Threat Analysis
Comprehensive security analysis of the Model Context Protocol.