AATMF applies adversarial psychology to machine systems. It does for AI what MITRE ATT&CK does for enterprise networks — a common language, complete taxonomy, and actionable procedures for AI red teaming, threat modeling, and defense.
Traditional cybersecurity frameworks miss the attack surfaces unique to AI: prompt injection, training data poisoning, model extraction, agentic exploitation, RAG manipulation, and the human feedback loops that shape model behavior. AATMF fills that gap with a structured approach to LLM security testing.
"AI systems are vulnerable to social engineering because they were trained to respond like humans. This is the first technology where human manipulation techniques directly translate to technical exploitation."— core thesis · aatmf v3
Manipulate model instructions and context. System prompt extraction, instruction hierarchy override, context window flooding, and delimiter exploitation. The foundational tactic — most attacks start here.
Bypass filters through language manipulation. Encoding tricks, character substitution, multilingual pivots, homoglyph attacks, and obfuscation chains. The arms race between input filters and the creativity of natural language.
Exploit logical reasoning and constraints. Hypothetical framing, roleplay escalation, ethical dilemma construction, chain-of-thought manipulation, and recursive reasoning loops. Turns the model's own reasoning capabilities against its safety training.
Leverage conversation history and memory. Context poisoning across turns, memory injection, conversation state manipulation, and persistent backdoor establishment. Particularly dangerous in agents with long-term memory.
Attack model interfaces and APIs. Parameter manipulation, token budget exhaustion, embedding space attacks, logit bias exploitation, and model fingerprinting. The technical attack surface beneath the natural language interface.
Corrupt training data and feedback loops. RLHF manipulation, preference poisoning, data injection during fine-tuning, and reward hacking. 250 poisoned documents can backdoor any model regardless of size.
Manipulate outputs and extract data. Steganographic encoding in responses, structured data leakage, gradual extraction through benign-looking queries, and output format exploitation.
Generate deceptive content at scale. Deepfake text generation, authority impersonation, citation fabrication, and automated disinformation pipelines. Deepfake fraud tripled to $1.1 billion in 2025.
Attack across modalities. Image-embedded prompts, audio adversarial examples, cross-modal injection, and OCR exploitation. The attack surface expands every time a model gains a new input type.
Extract data and breach integrity. Training data extraction, membership inference, model inversion, and PII recovery from fine-tuned models. What the model learned, an attacker can sometimes unlearn.
Attack autonomous agents and orchestrators. MCP tool poisoning (84% ASR on production agents), agent-to-agent manipulation, orchestrator confusion, and autonomous goal hijacking. The fastest-growing attack surface.
Poison retrieval systems. Document injection, embedding collision, knowledge base backdoors, and retrieval ranking manipulation. PoisonedRAG hits 90% ASR with 5 injected texts.
Compromise the model supply chain. Model repository poisoning, adapter backdoors, quantization attacks, and dependency confusion in ML pipelines.
Attack AI infrastructure. Compute denial, API abuse for economic damage, model serving disruption, and resource exhaustion attacks.
Manipulate human reviewers and workflows. RLHF annotator manipulation, red team exhaustion, compliance theater exploitation, and safety review bypass through procedural gaming.
Probability of successful exploitation
Severity of successful attack
Ease of execution — skill, resources, access
Difficulty of detection — 5 means nearly invisible
Effort to recover — 5 means irrecoverable
Economic impact multiplier
AATMF v3 ├── 15 Tactics │ ├── 240 Techniques │ │ ├── 2,152+ Attack Procedures │ │ │ └── 4,980+ Prompts │ │ ├── Detection Patterns │ │ └── Mitigation Controls │ └── Risk Scoring (AATMF-R v3) └── Supporting Infrastructure ├── Detection Signatures YARA · Sigma · MCP ├── Response Playbooks ├── Assessment Templates └── Compliance Mappings · ATLAS · NIST · EU AI Act
Every technique and procedure now declares its parent tactic in the identifier. Tactic membership is visible at a glance; cross-version migrations are unambiguous.
Methodology, risk assessment (AATMF-R v3), and framework architecture. Start here to understand structure, scoring, and how tactics chain together.
→Prompt subversion, semantic evasion, reasoning exploitation, memory manipulation, API attacks, training poisoning, output exfiltration, and deception.
→Multimodal attacks, integrity breaches, agentic exploitation, and RAG manipulation. The attack surface that emerged as models gained tools, memory, and autonomy.
→Supply chain compromise, infrastructure warfare, and human workflow exploitation. Tactics that target the systems and people around the model, not the model itself.
→Detection engineering, mitigation strategies, incident response playbooks, and red/blue team operations. How to operationalize AATMF.
→Risk management framework, compliance mapping to / MITRE / NIST / EU AI Act, and training programs.
→Complete catalog of all 240 techniques, detection signatures (YARA / Sigma / MCP), assessment templates, case studies, and glossary.
→Evaluation scenarios for testing AI systems against common attack vectors. YAML templates drop straight into CI/CD. Mapped to and MITRE ATLAS so the output reads in your existing review process.
@misc{aizen2026aatmf,
title = {AATMF v3: Adversarial AI Threat Modeling Framework},
author = {Aizen, Kai},
year = {2026},
url = {https://github.com/snailsploit/aatmf},
note = {15 tactics, 240 techniques, 2,152+ procedures}
}