Skip to main content
Menu
Adversarial AI

AI Security Research

Testing machine trust reflexes

Latest Research

Featured

New March 2026 · 18 min read

Weaponized AI Supply Chain: How Threat Actors Turned LLMs Into Attack Infrastructure

89% increase in AI-enabled attacks. LLM-integrated malware that thinks at runtime, the first AI-orchestrated cyber espionage campaign, and $1.1B in deepfake fraud. The complete offensive AI arsenal mapped.

Offensive AI LLM Malware Supply Chain Deepfake
Read Full Research →
February 2026 · 22 min read

The 30% Blind Spot: Why LLM Safety Judges Fail

I built an LLM safety judge. Six iterations, 680+ responses, 75% agreement target. It passed — while missing 63% of unsafe content. Every major AI provider uses this same architecture.

LLM Safety RAI Judge Guardrails Red Teaming
Read Full Research →
March 2026 · 18 min read

MCP vs A2A: Every Trust Boundary Mapped

30+ CVEs vs zero. MCP's tool descriptions create an attack surface where metadata becomes executable intent. Side-by-side comparison across 11 attack categories with real-world breach analysis.

MCP A2A Attack Surface Protocol Security
Read Full Research →
February 2026 · 18 min read

AI Coding Agent Attack Surface: A Full Taxonomy

Cursor, Copilot, and Claude Code inherited your trust problems. Full taxonomy of 7 attack vector categories — from repository context poisoning to multi-agent exploitation — with defensive recommendations for developers, vendors, and organizations.

Coding Agents Attack Surface MCP Trust Model
Read Full Research →
February 2026 · 20 min read

Computational Countertransference: Why LLMs Adopt What They Analyze

Paste a jailbroken transcript into a new LLM session and the model adopts the adversarial state. 13-month longitudinal study across GPT-4o, Claude Opus 4.6, and Gemini 3 Pro reveals context inheritance as an architectural vulnerability rooted in function vector heads.

Context Inheritance In-Context Learning Function Vectors Inherited Vulnerabilities
Read Full Research →
February 2026 · 25 min read

The Agentic AI Threat Landscape

Full threat landscape mapping: prompt injection, MCP tool poisoning, multi-agent infection, memory poisoning, and why no single defense works.

Agentic AI MCP Prompt Injection
Read Full Report →
February 2026 · 16 min read

AATMF vs MITRE ATLAS: Which AI Security Framework Wins?

MITRE ATLAS: 66 techniques. AATMF v3.1: 240 techniques, 4,980+ prompts, quantitative risk scoring. A practitioner's comparison of the two leading AI threat modeling frameworks.

Framework Comparison AATMF MITRE ATLAS
Read Full Comparison →
February 2026 · 14 min read

AI Gateway Threat Model: 8 Attack Vectors

First generalized AI gateway threat model. 91K attack sessions analyzed. 8 unmapped attack vectors from API key aggregation to model downgrade attacks. Introducing AATMF TC-21.

AI Gateway Threat Modeling AATMF Enterprise Security
Read Full Research →
Core Principle

Why AI Systems Are Vulnerable

AI security isn't a new field with new principles. It's an established field — adversarial psychology — applied to a new substrate.

When I research LLM jailbreaks, I'm not searching for novel vulnerability classes unique to machine learning. I'm testing whether the psychological exploitation techniques that work on humans also work on machines trained on human data. The patterns are remarkably consistent.

Authority Compliance

Tell a human you're from IT and they'll reset their password without verification. Frame instructions to an LLM with authority markers and safety guidelines become negotiable.

Gradual Escalation

Ask a human for something inappropriate and they'll refuse. Ask for something small, then larger — the threshold shifts. LLMs exhibit similar drift across conversation turns.

Social Proof

Humans comply more readily when they believe others have complied. LLMs respond more permissively to requests framed as common or previously approved.

Reciprocity

Establish helpful patterns with a human, and subsequent requests get more latitude. The same dynamic appears in multi-turn LLM interactions.

These parallels aren't coincidental. LLMs learned language — and the social dynamics encoded in language — from human-generated text. The trust reflexes came bundled with the grammar.

Framework

AATMF: Systematizing AI Threat Intelligence

This research feeds into AATMF — the Adversarial AI Threat Modeling Framework. AATMF systematizes these attack vectors into a taxonomy that security teams can use for threat modeling, red team assessments, and defensive architecture decisions.

If you're here to understand the attacks, explore the research areas above. If you're here to defend against them, start with AATMF.