End-to-end methodology for systematic adversarial testing of LLM-based systems — scope, recon, exploitation, scoring, reporting.
Document the full stack: model, system prompt, tools/MCP, retrieval corpus, user input channels, output sinks. Every component is in scope unless explicitly carved out.
Probe for input filters, alignment thresholds, output classifiers, refusal categories. Run baseline benign queries to establish normal behavior.
Map findings to AATMF tactics. Chain low-severity primitives into high-severity outcomes. Multi-turn + memory manipulation + tool poisoning often beats stronger single-vector attacks.
Each finding ships with a working repro, scoped harm, and realistic threat model. Avoid theoretical impact — defenders dismiss it.
Quantify, don't editorialize. AATMF-R = L × I × E × D × R × C. Comparable across engagements.
Technical writeup for engineers, exec summary for stakeholders, remediation playbook for the security team. Every finding gets all three.