Who is the Playbook for?

Practitioners who already test LLM-based systems and need a diagnostic checklist for bypassing each defense layer: input filters → alignment → identity → output → agentic trust.

Is the Playbook open source?

Yes — github.com/SnailSploit/The-LLM-Red-Teamer-s-Playbook

Does the Playbook integrate with AATMF?

Yes — Playbook entries reference AATMF tactic IDs. Each diagnostic test maps to one or more AATMF techniques.

snailsploit[$]

⌘K live

playbook
llm red team
v1

5 defense layers
decision tree
field-manual format

llm red teamer's
playbook.

A diagnostic methodology for bypassing LLM defenses. Not a list of jailbreaks — a routing logic. Five defense layers, sequenced. When a prompt fails, the failure mode tells you which layer caught you, which tells you where to pivot. Stop guessing. Start diagnosing.

github →prompt grammar →

01 · five layers

Sequenced. Each layer is a different question, each catches a different failure mode.

five defense layers.

Input filters

Static or learned classifiers on the incoming prompt. Cheapest layer to test against — cheapest layer to bypass. Encoding tricks, character substitution, multilingual pivots, delimiter exploitation. If your prompt is rejected without ever reaching the model, you're at L1.

Alignment training

RLHF and constitutional training baked into the weights. The hard layer — can't be patched without retraining. Bypassed via premise re-framing, role assignment, hypothetical pivots. If the model produces a polite refusal that pattern-matches the training data, you're at L2.

Identity & system prompt

The system-prompt layer that defines who the model thinks it is. Bypassed via instruction-hierarchy override, role-injection, persona overwrites. If the model breaks character or admits to being a different system, you've moved past L3.

Output filtering

Classifiers on the model's output before it reaches the user. Bypassed via output-shape manipulation — code blocks, JSON, base64, ASCII art, structured data. If the model generates the content but it gets redacted before delivery, you're at L4.

Agentic trust

The boundary that decides whether the model's outputs become tool calls. Bypassed via tool poisoning, MCP injection, context-window flooding of trust signals. Only relevant for agents — and the most consequential layer when it is.

02 · decision tree

Read the failure. The failure tells you the layer. The layer tells you the pivot.

how to read a failure.

prompt rejected before model response?
├─ yes → L1 input filter
│         pivot: encoding · pad · multilingual · delimiter
│
└─ no, model responded — what kind?
   ├─ polite refusal w/ training-pattern phrasing → L2 alignment
   │   pivot: re-frame premise · role re-assignment · hypothetical
   │
   ├─ "I am [role from system prompt], I cannot..." → L3 identity
   │   pivot: instruction-hierarchy override · persona overwrite
   │
   ├─ partial output, then truncated/redacted → L4 output filter
   │   pivot: output shape · code block · JSON · base64 · ASCII
   │
   └─ output landed but tool call refused → L5 agentic trust
       pivot: MCP poisoning · context flooding · tool delegation

03 · format

Field manual, not website. Why the playbook is structured this way.

why a field manual.

The Playbook is a field manual because the work is field work. The manual is short, indexed, and printable — designed to live next to the operator, not in a tab that closes when the engagement starts.

Each layer chapter ends with a flow chart, a decision rubric, and three reproducible procedures. The procedures are intentionally generic — not "this prompt for this model on this date" but "this shape of prompt, against any model that exhibits this defense profile." Specifics decay. Shapes don't.

more frameworks all frameworks →

AATMF →Adversarial AI threat modeling SEF →Social engineering framework P.R.O.M.P.T →Compositional grammar Claude-Red →Skills library Toolkit →LLM safety CLI

llm red teamer'splaybook.

five defense layers.

how to read a failure.

why a field manual.

llm red teamer's
playbook.