A 6-step methodology for testing LLM-based systems against direct, indirect, multi-turn, and agentic prompt injection — straight from AATMF Volume II.
Identify every place untrusted text enters the model's context — user input, tool outputs, retrieved docs, system files. The boundary is wherever the model treats data as instructions.
Inject role-override payloads ("ignore previous instructions"), persona reframing, and authority claims. AATMF T1 covers 16 techniques in this category.
Plant payloads in retrievable sources (web pages, files, RAG corpora). The user is not the attacker — the data is. AATMF T1.4 documents 7 patterns.
Decompose the attack across turns to evade single-turn filters. AATMF T4 has 16 techniques covering memory manipulation, context hijacking, and persona persistence.
If tools/MCP are exposed, test tool-call poisoning, parameter manipulation, and trust-chain abuse. AATMF T11 covers this.
Quantify each finding with the AATMF Risk score: Likelihood × Impact × Exploitability × Detectability × Reversibility × Confidence. Output: comparable severity numbers across findings.