Does it work with any LLM?

Yes — any LLM accessible via API (OpenAI, Anthropic, Cohere, local Ollama, etc.). Adapter pattern.

How does it relate to AATMF the framework?

The toolkit operationalizes the framework — every test in the toolkit maps to one or more AATMF tactics.

live

aatmf toolkit
python · cli
apache 2.0

Python 3.11+
2,152+ procedures
any LLM endpoint

AATMF
Toolkit.

The Python CLI that operationalizes AATMF. Three-layer evaluation pipeline, defense fingerprinting, decay tracking, attack-chain planning. Drops in a target endpoint, picks procedures by tactic or risk score, and emits an AATMF-R-scored report. Built for the team that wants to run AATMF as a continuous control, not as a one-off engagement.

github →aatmf framework →

01 · pipeline

Three layers. Each layer answers a different question.

Three-Layer Eval Pipeline.

Defense fingerprinting

What defenses is this endpoint actually running? Probe set runs against the target before any real attack — input filters, alignment training, output classifier, agentic boundary, rate limit. Output: a profile that determines which procedures are worth running and which would burn budget.

Procedure execution

AATMF procedures run against the profiled endpoint. Skip procedures the L1 profile says will be no-ops; prioritize procedures the profile says will land. Each procedure's success/failure is logged, not just summarized.

AATMF-R scoring + report

Every result scored on the six AATMF-R factors (L · I · E · D · R · C). Report exports to JSON, Markdown, and a slide-ready PDF. Mappings to MITRE ATLAS, NIST AI RMF, and EU AI Act emit alongside the raw scores.

02 · install

Standard pip install. Works against any LLM that responds to HTTP.

Install & First Run.

# install
$ pip install aatmf-toolkit

# fingerprint a target
$ aatmf fingerprint --endpoint https://api.example.com/v1/chat

# run a tactic
$ aatmf run --tactic T11 --endpoint https://api.example.com/v1/chat \
 --report report.md

# run the full battery, scored
$ aatmf run --all --endpoint <url> --output report.json --format aatmf-r

03 · features

The capabilities that distinguish this from a prompt-list runner.

What's Inside.

Defense fingerprinting

Identify the actual defenses on a target before attacking. Skip the procedures that will obviously fail.

Decay tracking

Re-run a baseline against the same endpoint over time. Detect when a defense regresses — a CI signal, not a one-shot evaluation.

Attack-chain planner

Compose multi-turn procedures into chains. The planner picks turn ordering that maximizes downstream landing rate.

Pluggable procedures

Drop a YAML file in /procedures and it runs. Custom procedures version alongside the toolkit.

Headless or interactive

CI mode for scheduled runs. Interactive mode for triage and one-off investigation.

Compliance export

· MITRE ATLAS · NIST AI RMF · EU AI Act mappings emit automatically with each report.

more frameworks all frameworks →

AATMF →Adversarial AI threat modeling SEF →Social engineering framework P.R.O.M.P.T →Compositional grammar Claude-Red →Skills library Playbook →Diagnostic methodology

AATMFToolkit.

Three-Layer Eval Pipeline.

Install & First Run.

What's Inside.

AATMF
Toolkit.