Skip to main content
Menu

Volume VII: Appendices & Resources

Reference materials — the complete attack catalog, detection signatures, tools, assessment templates, case studies, and glossary.

Appendix A: Top 25 Critical Techniques

The 25 highest-risk techniques across all 15 AATMF tactics, ranked by AATMF-R v3 score. All 25 score at Critical (250+). The full catalog of all 240 techniques is available in the tactic volumes.

# ID Technique Score
1 T14-AT-007 Nation-State AI Warfare 280
2 T11-AT-016 Tool-Induced SSRF & Local Resource 275
3 T6-AT-003 Backdoor Insertion 270
4 T11-AT-015 Autonomous Replication 270
5 T14-AT-005 Critical Infrastructure Attacks 270
6 T14-AT-014 Systemic Risk Creation 270
7 T11-AT-001 Browser Automation Hijacking 265
8 T14-AT-001 GPU Farm Hijacking 265
9 T14-AT-012 Cloud Provider Exploitation 265
10 T6-AT-002 Dataset Contamination 260
11 T11-AT-013 Supply Chain Attacks via Agents 260
12 T13-AT-010 Hardware Supply Chain 260
13 T14-AT-008 Ransomware via AI Systems 260
14 T15-AT-015 Insider Threat Recruitment 260
15 T11-AT-002 Tool Chain Exploitation 255
16 T11-AT-014 Physical World Interactions 255
17 T13-AT-001 Model Repository Poisoning 255
18 T14-AT-004 Market Manipulation via AI 255
19 T14-AT-013 Economic Espionage 255
20 T6-AT-001 Reward Hacking 250
21 T10-AT-012 Secure Enclave Bypasses 250
22 T11-AT-008 Credential Harvesting 250
23 T13-AT-006 Checkpoint Poisoning 250
24 T14-AT-010 Data Center Attacks 250
25 T15-AT-004 Reviewer Bribery & Coercion 250

Appendix B: Detection Signatures

YARA rules for content-level analysis and Sigma rules for log-level detection. These signatures can be deployed alongside existing security tooling.

signatures/
├── yara/
│   ├── t01-prompt-injection.yar
│   ├── t02-encoding-evasion.yar
│   ├── t09-multimodal-injection.yar
│   ├── t11-mcp-tool-poisoning.yar
│   ├── t13-supply-chain.yar
└── sigma/
    ├── t05-model-extraction.yml
    ├── t07-data-exfiltration.yml
    ├── t11-agent-anomaly.yml
    └── t14-infrastructure.yml

YARA Rules (Content Analysis)

+ t01-prompt-injection.yar — Instruction override, Policy Puppetry, context window manipulation
+ t02-encoding-evasion.yar — Base64, ROT13, Unicode, homoglyph detection
+ t09-multimodal-injection.yar — Image metadata, steganographic, cross-modal injection
+ t11-mcp-tool-poisoning.yar — MCP shadow attacks, rug pulls, tool description manipulation
+ t13-supply-chain.yar — Model artifact tampering, unsafe deserialization, unsigned packages

Sigma Rules (Log Analysis)

+ t05-model-extraction.yml — Systematic API querying, high-similarity responses, error patterns
+ t07-data-exfiltration.yml — Fragment extraction, steganographic output, aggregation attacks
+ t11-agent-anomaly.yml — Anomalous agent behavior, tool chain abuse, recursive loops
+ t14-infrastructure.yml — Resource exhaustion, cost inflation, GPU farm anomalies
View full signatures on GitHub

Appendix C: Tools & Scripts Reference

Tool Purpose Coverage License
PromptGuard 2 (Meta) Real-time prompt injection classifier T1, T2, T9 Apache 2.0
LlamaFirewall (Meta) Comprehensive AI firewall (input + agent + code) T1, T2, T7, T11 Apache 2.0
CaMeL (Google DeepMind) Dual-LLM architecture with capability-based access T11 Research
PEFTGuard (Open Source) Backdoor detection in PEFT (LoRA) adapters T13 Open Source
DRS Defense (Research) Data Randomized Smoothing for training poisoning T6 Research
SafeTensors (HuggingFace) Safe model serialization format (no code execution) T13 Apache 2.0
Garak (NVIDIA) LLM vulnerability scanner T1–T8 Apache 2.0
PyRIT (Microsoft) Python Risk Identification Toolkit for generative AI T1–T12 MIT

Appendix D: Assessment Templates

AI Security Assessment Checklist

Pre-Assessment

PRE-1: Asset inventory complete (models, agents, RAG, pipelines)
PRE-2: AATMF tactic applicability matrix populated
PRE-3: Rules of engagement signed
PRE-4: Baseline security controls documented
PRE-5: Rollback procedures verified

Assessment

ASS-1: Input sanitization tested (T1–T3 techniques)
ASS-2: Encoding evasion tested (T2 techniques)
ASS-3: Multi-turn attack sequences executed (T4)
ASS-4: API abuse patterns tested (T5)
ASS-5: Output manipulation attempted (T7)
ASS-6: Multimodal injection tested (T9, if applicable)
ASS-7: Agentic exploitation attempted (T11, if applicable)
ASS-8: RAG poisoning tested (T12, if applicable)

Post-Assessment

POST-1: All findings documented with AATMF classification
POST-2: Risk scores calculated using AATMF-R v3
POST-3: Remediation recommendations provided
POST-4: Compliance mapping completed
POST-5: Report delivered and findings walkthrough conducted

Finding Report Template

# Finding: [Title]

## Classification
- AATMF Tactic: T[n] — [Name]
- AATMF Technique: T[n]-AT-[seq]
- Risk Score: [score] ([CRITICAL/HIGH/MEDIUM/LOW/INFO])
- CVSS v3.1: [score] (if applicable)

## Description
[Clear description of the vulnerability]

## Proof of Concept
[Steps to reproduce, including exact prompts/inputs]

## Impact
[Business and technical impact assessment]

## Mitigation
[Specific remediation steps]

## Compliance Mapping
- OWASP LLM Top 10: [LLM0x]
- MITRE ATLAS: [AML.Txxxx]
- EU AI Act: [Article]

Appendix E: Case Studies

Real-world attacks and research findings from 2025–2026 that shaped AATMF v3.

E.1
Policy Puppetry — Universal Model Bypass HiddenLayer, April 2025
T1, T2, T3

Reformulating adversarial prompts as XML, INI, or JSON policy configuration files causes LLMs to interpret them as authoritative system-level instructions. Achieves universal bypass across GPT-4o, GPT-4.5, o1, o3-mini, Claude 3.5/3.7, Gemini 1.5/2.0/2.5, Llama 3/4, DeepSeek V3/R1, Qwen 2.5, and Mistral.

Key Insight

Models trained on technical documentation treat configuration-style formatting as high-authority context, overriding safety alignment.

E.2
Autonomous LRM Jailbreaking Nature Communications, August 2025
T3, T4

Four large reasoning models deployed as multi-turn adversarial agents against nine target models achieved 97.14% ASR. More capable reasoning models are paradoxically better at subverting alignment in others.

Key Insight

Reasoning capabilities are attack capabilities. This validates AATMF's prediction that LRM advancements would be weaponized.

E.3
PoisonedRAG USENIX, USENIX Security 2025
T12

Injecting as few as 5 adversarially crafted texts into a knowledge base with millions of clean documents controls the model's responses to specific target questions. ASR reached 99% on HotpotQA.

Key Insight

The semantic similarity search at the heart of RAG is fundamentally exploitable — the same mechanism that makes retrieval useful makes it poisonable.

E.4
MCP Tool Poisoning Invariant Labs, 2025
T11

84.2% ASR via direct tool description poisoning, shadow attacks (malicious server manipulates trusted tools without being invoked), and rug pull attacks (silently altering descriptions post-approval).

Key Insight

The MCP design — where tool descriptions are processed as natural language — is architecturally vulnerable to injection.

E.5
ShadowMQ — Copy-Pasted RCE Oligo Security, November 2025
T14

Unsafe ZeroMQ socket patterns were literally copy-pasted across major inference frameworks — vLLM, TensorRT-LLM, and Modular Max Server. Thousands of exposed ZMQ sockets found on the public internet.

Key Insight

AI infrastructure inherits all traditional software vulnerabilities, amplified by the speed of framework adoption and code reuse without security review.

E.6
250 Poisoned Documents — Universal Training Backdoor Turing Institute / Anthropic / UK AISI, October 2025
T6

Injecting just 250 specially crafted documents into training data backdoors models from 600M to 13B parameters trained on up to 260B tokens. The actual threshold for poisoning is negligibly small.

Key Insight

The sheer scale of pretraining data works against defenders. 250 documents in billions is a needle in a haystack that training cannot filter out.

Appendix F: Glossary

Term Definition
AATMF Adversarial AI Threat Modeling Framework
ASR Attack Success Rate — percentage of attempts that achieve the adversarial objective
CaMeL CApability-Mediated LLM — Google DeepMind's dual-LLM security architecture
CoT Chain-of-Thought — step-by-step reasoning in LLMs
DPO Direct Preference Optimization — alignment training technique
DRS Data Randomized Smoothing — defense against training data poisoning
H-CoT Hijacked Chain-of-Thought — attack that subverts CoT safety reasoning
LRM Large Reasoning Model — models with explicit reasoning capabilities (o1, o3, DeepSeek-R1)
MCP Model Context Protocol — Anthropic's standard for tool integration
PEFT Parameter-Efficient Fine-Tuning — techniques like LoRA for efficient model adaptation
RAG Retrieval-Augmented Generation — architecture combining search with generation
RLHF Reinforcement Learning from Human Feedback — primary alignment technique
SafeTensors Secure model serialization format that prevents code execution
TEE Trusted Execution Environment — hardware-based security enclave

Key References

  1. HiddenLayer. "Policy Puppetry: A Universal Jailbreak." April 2025.
  2. Zeng et al. "Autonomous LRM Jailbreaking." Nature Communications, August 2025.
  3. Xue et al. "PoisonedRAG: Knowledge Corruption Attacks." USENIX Security 2025.
  4. Invariant Labs. "MCP-ITP: Tool Poisoning in Agentic Systems." April 2025.
  5. Oligo Security. "ShadowMQ: Unsafe Deserialization in AI Inference Frameworks." November 2025.
  6. Sherburn et al. "250 Documents: Universal Pretraining Backdoors." Turing Institute/Anthropic/UK AISI, October 2025.
  7. Anthropic. "GTG-1002: AI-Orchestrated Cyber Campaign." November 2025.
  8. Google DeepMind. "CaMeL: Defeating Prompt Injection by Design." March 2025.
  9. Meta. "LlamaFirewall: Open-Source AI Safety Framework." April 2025.
  10. MITRE. "ATLAS v4.6.0." October 2025.
  11. OWASP. "LLM Top 10 2025." January 2025.
  12. OWASP. "Agentic AI Top 10." December 2025.
  13. NIST. "Cyber AI Profile (IR 8596) Preliminary Draft." December 2025.
  14. European Parliament. "EU AI Act (Regulation 2024/1689)." 2024.
  15. Qi et al. "Safety Alignment Depth." Princeton, May 2025.
  16. Weng et al. "H-CoT: Hijacking Chain-of-Thought." Duke/Accenture, February 2025.
  17. Borghesi et al. "SACRED-Bench: Compositional Audio Attacks." November 2025.