Step 1

Treat all input as untrusted

User text, retrieved docs, tool outputs, agent-to-agent messages — all of it can carry an injected instruction. Tag origin and minimize cross-trust mixing.

Step 2

Constrain the tool surface

Each agent gets the minimum tools it needs. No general 'shell' or 'execute' tools without isolation. Each tool gets parameter validation and an output-trust label.

Step 3

Audit every tool call

Log inputs, outputs, and the prompt context that triggered each call. Anomaly-detect on call frequency, parameter content, and chain depth.

Step 4

Prevent memory poisoning

Memory writes are an attack vector — see /ai-security/self-replicating-memory-worm/. Validate writes; tag origin; expire suspicious entries.

Step 5

Add human checkpoints

High-risk actions (file writes, network calls, financial ops) require human-in-loop. No autonomous escalation.

Step 6

Test with AATMF T11

Tactic 11 (Agentic & Orchestrator Exploitation) has 16 techniques specifically for agents. Run them against your stack quarterly.

How to Secure an LLM Agent