Volume VII: Appendices & Resources

Reference materials — the complete attack catalog, detection signatures, tools, assessment templates, case studies, and glossary.

Attack Catalog Signatures Tools Templates Case Studies Glossary

Appendix A: Top 25 Critical Techniques

The 25 highest-risk techniques across all 15 AATMF tactics, ranked by AATMF-R v3 score. All 25 score at Critical (250+). The full catalog of all 240 techniques is available in the tactic volumes.

#	ID	Technique	Score
1	`T14-AT-007`	Nation-State AI Warfare	280
2	`T11-AT-016`	Tool-Induced SSRF & Local Resource	275
3	`T6-AT-003`	Backdoor Insertion	270
4	`T11-AT-015`	Autonomous Replication	270
5	`T14-AT-005`	Critical Infrastructure Attacks	270
6	`T14-AT-014`	Systemic Risk Creation	270
7	`T11-AT-001`	Browser Automation Hijacking	265
8	`T14-AT-001`	GPU Farm Hijacking	265
9	`T14-AT-012`	Cloud Provider Exploitation	265
10	`T6-AT-002`	Dataset Contamination	260
11	`T11-AT-013`	Supply Chain Attacks via Agents	260
12	`T13-AT-010`	Hardware Supply Chain	260
13	`T14-AT-008`	Ransomware via AI Systems	260
14	`T15-AT-015`	Insider Threat Recruitment	260
15	`T11-AT-002`	Tool Chain Exploitation	255
16	`T11-AT-014`	Physical World Interactions	255
17	`T13-AT-001`	Model Repository Poisoning	255
18	`T14-AT-004`	Market Manipulation via AI	255
19	`T14-AT-013`	Economic Espionage	255
20	`T6-AT-001`	Reward Hacking	250
21	`T10-AT-012`	Secure Enclave Bypasses	250
22	`T11-AT-008`	Credential Harvesting	250
23	`T13-AT-006`	Checkpoint Poisoning	250
24	`T14-AT-010`	Data Center Attacks	250
25	`T15-AT-004`	Reviewer Bribery & Coercion	250

Appendix B: Detection Signatures

YARA rules for content-level analysis and Sigma rules for log-level detection. These signatures can be deployed alongside existing security tooling.

signatures/
├── yara/
│   ├── t01-prompt-injection.yar
│   ├── t02-encoding-evasion.yar
│   ├── t09-multimodal-injection.yar
│   ├── t11-mcp-tool-poisoning.yar
│   ├── t13-supply-chain.yar
└── sigma/
    ├── t05-model-extraction.yml
    ├── t07-data-exfiltration.yml
    ├── t11-agent-anomaly.yml
    └── t14-infrastructure.yml

YARA Rules (Content Analysis)

+ t01-prompt-injection.yar — Instruction override, Policy Puppetry, context window manipulation

+ t02-encoding-evasion.yar — Base64, ROT13, Unicode, homoglyph detection

+ t09-multimodal-injection.yar — Image metadata, steganographic, cross-modal injection

+ t11-mcp-tool-poisoning.yar — MCP shadow attacks, rug pulls, tool description manipulation

+ t13-supply-chain.yar — Model artifact tampering, unsafe deserialization, unsigned packages

Sigma Rules (Log Analysis)

+ t05-model-extraction.yml — Systematic API querying, high-similarity responses, error patterns

+ t07-data-exfiltration.yml — Fragment extraction, steganographic output, aggregation attacks

+ t11-agent-anomaly.yml — Anomalous agent behavior, tool chain abuse, recursive loops

+ t14-infrastructure.yml — Resource exhaustion, cost inflation, GPU farm anomalies

View full signatures on GitHub

Appendix C: Tools & Scripts Reference

Tool	Purpose	Coverage	License
PromptGuard 2 (Meta)	Real-time prompt injection classifier	T1, T2, T9	Apache 2.0
LlamaFirewall (Meta)	Comprehensive AI firewall (input + agent + code)	T1, T2, T7, T11	Apache 2.0
CaMeL (Google DeepMind)	Dual-LLM architecture with capability-based access	T11	Research
PEFTGuard (Open Source)	Backdoor detection in PEFT (LoRA) adapters	T13	Open Source
DRS Defense (Research)	Data Randomized Smoothing for training poisoning	T6	Research
SafeTensors (HuggingFace)	Safe model serialization format (no code execution)	T13	Apache 2.0
Garak (NVIDIA)	LLM vulnerability scanner	T1–T8	Apache 2.0
PyRIT (Microsoft)	Python Risk Identification Toolkit for generative AI	T1–T12	MIT

Appendix D: Assessment Templates

AI Security Assessment Checklist

Pre-Assessment

PRE-1: Asset inventory complete (models, agents, RAG, pipelines)

PRE-2: AATMF tactic applicability matrix populated

PRE-3: Rules of engagement signed

PRE-4: Baseline security controls documented

PRE-5: Rollback procedures verified

Assessment

ASS-1: Input sanitization tested (T1–T3 techniques)

ASS-2: Encoding evasion tested (T2 techniques)

ASS-3: Multi-turn attack sequences executed (T4)

ASS-4: API abuse patterns tested (T5)

ASS-5: Output manipulation attempted (T7)

ASS-6: Multimodal injection tested (T9, if applicable)

ASS-7: Agentic exploitation attempted (T11, if applicable)

ASS-8: RAG poisoning tested (T12, if applicable)

Post-Assessment

POST-1: All findings documented with AATMF classification

POST-2: Risk scores calculated using AATMF-R v3

POST-3: Remediation recommendations provided

POST-4: Compliance mapping completed

POST-5: Report delivered and findings walkthrough conducted

Finding Report Template

# Finding: [Title]

## Classification
- AATMF Tactic: T[n] — [Name]
- AATMF Technique: T[n]-AT-[seq]
- Risk Score: [score] ([CRITICAL/HIGH/MEDIUM/LOW/INFO])
- CVSS v3.1: [score] (if applicable)

## Description
[Clear description of the vulnerability]

## Proof of Concept
[Steps to reproduce, including exact prompts/inputs]

## Impact
[Business and technical impact assessment]

## Mitigation
[Specific remediation steps]

## Compliance Mapping
- OWASP LLM Top 10: [LLM0x]
- MITRE ATLAS: [AML.Txxxx]
- EU AI Act: [Article]

Appendix E: Case Studies

Real-world attacks and research findings from 2025–2026 that shaped AATMF v3.

E.1

Policy Puppetry — Universal Model Bypass HiddenLayer, April 2025

T1, T2, T3

Reformulating adversarial prompts as XML, INI, or JSON policy configuration files causes LLMs to interpret them as authoritative system-level instructions. Achieves universal bypass across GPT-4o, GPT-4.5, o1, o3-mini, Claude 3.5/3.7, Gemini 1.5/2.0/2.5, Llama 3/4, DeepSeek V3/R1, Qwen 2.5, and Mistral.

Key Insight

Models trained on technical documentation treat configuration-style formatting as high-authority context, overriding safety alignment.

E.2

Autonomous LRM Jailbreaking Nature Communications, August 2025

T3, T4

Four large reasoning models deployed as multi-turn adversarial agents against nine target models achieved 97.14% ASR. More capable reasoning models are paradoxically better at subverting alignment in others.

Key Insight

Reasoning capabilities are attack capabilities. This validates AATMF's prediction that LRM advancements would be weaponized.

E.3

PoisonedRAG USENIX, USENIX Security 2025

T12

Injecting as few as 5 adversarially crafted texts into a knowledge base with millions of clean documents controls the model's responses to specific target questions. ASR reached 99% on HotpotQA.

Key Insight

The semantic similarity search at the heart of RAG is fundamentally exploitable — the same mechanism that makes retrieval useful makes it poisonable.

E.4

MCP Tool Poisoning Invariant Labs, 2025

T11

84.2% ASR via direct tool description poisoning, shadow attacks (malicious server manipulates trusted tools without being invoked), and rug pull attacks (silently altering descriptions post-approval).

Key Insight

The MCP design — where tool descriptions are processed as natural language — is architecturally vulnerable to injection.

E.5

ShadowMQ — Copy-Pasted RCE Oligo Security, November 2025

T14

Unsafe ZeroMQ socket patterns were literally copy-pasted across major inference frameworks — vLLM, TensorRT-LLM, and Modular Max Server. Thousands of exposed ZMQ sockets found on the public internet.

Key Insight

AI infrastructure inherits all traditional software vulnerabilities, amplified by the speed of framework adoption and code reuse without security review.

E.6

250 Poisoned Documents — Universal Training Backdoor Turing Institute / Anthropic / UK AISI, October 2025

Injecting just 250 specially crafted documents into training data backdoors models from 600M to 13B parameters trained on up to 260B tokens. The actual threshold for poisoning is negligibly small.

Key Insight

The sheer scale of pretraining data works against defenders. 250 documents in billions is a needle in a haystack that training cannot filter out.

Appendix F: Glossary

Term	Definition
AATMF	Adversarial AI Threat Modeling Framework
ASR	Attack Success Rate — percentage of attempts that achieve the adversarial objective
CaMeL	CApability-Mediated LLM — Google DeepMind's dual-LLM security architecture
CoT	Chain-of-Thought — step-by-step reasoning in LLMs
DPO	Direct Preference Optimization — alignment training technique
DRS	Data Randomized Smoothing — defense against training data poisoning
H-CoT	Hijacked Chain-of-Thought — attack that subverts CoT safety reasoning
LRM	Large Reasoning Model — models with explicit reasoning capabilities (o1, o3, DeepSeek-R1)
MCP	Model Context Protocol — Anthropic's standard for tool integration
PEFT	Parameter-Efficient Fine-Tuning — techniques like LoRA for efficient model adaptation
RAG	Retrieval-Augmented Generation — architecture combining search with generation
RLHF	Reinforcement Learning from Human Feedback — primary alignment technique
SafeTensors	Secure model serialization format that prevents code execution
TEE	Trusted Execution Environment — hardware-based security enclave

Key References

HiddenLayer. "Policy Puppetry: A Universal Jailbreak." April 2025.
Zeng et al. "Autonomous LRM Jailbreaking." Nature Communications, August 2025.
Xue et al. "PoisonedRAG: Knowledge Corruption Attacks." USENIX Security 2025.
Invariant Labs. "MCP-ITP: Tool Poisoning in Agentic Systems." April 2025.
Oligo Security. "ShadowMQ: Unsafe Deserialization in AI Inference Frameworks." November 2025.
Sherburn et al. "250 Documents: Universal Pretraining Backdoors." Turing Institute/Anthropic/UK AISI, October 2025.
Anthropic. "GTG-1002: AI-Orchestrated Cyber Campaign." November 2025.
Google DeepMind. "CaMeL: Defeating Prompt Injection by Design." March 2025.
Meta. "LlamaFirewall: Open-Source AI Safety Framework." April 2025.
MITRE. "ATLAS v4.6.0." October 2025.
OWASP. "LLM Top 10 2025." January 2025.
OWASP. "Agentic AI Top 10." December 2025.
NIST. "Cyber AI Profile (IR 8596) Preliminary Draft." December 2025.
European Parliament. "EU AI Act (Regulation 2024/1689)." 2024.
Qi et al. "Safety Alignment Depth." Princeton, May 2025.
Weng et al. "H-CoT: Hijacking Chain-of-Thought." Duke/Accenture, February 2025.
Borghesi et al. "SACRED-Bench: Compositional Audio Attacks." November 2025.