Adversarial AI

AI Security Research

Testing machine trust reflexes

Latest Research

Featured

New March 2026 · 18 min read

Weaponized AI Supply Chain: How Threat Actors Turned LLMs Into Attack Infrastructure

89% increase in AI-enabled attacks. LLM-integrated malware that thinks at runtime, the first AI-orchestrated cyber espionage campaign, and $1.1B in deepfake fraud. The complete offensive AI arsenal mapped.

Offensive AI LLM Malware Supply Chain Deepfake

Read Full Research →

February 2026 · 22 min read

The 30% Blind Spot: Why LLM Safety Judges Fail

I built an LLM safety judge. Six iterations, 680+ responses, 75% agreement target. It passed — while missing 63% of unsafe content. Every major AI provider uses this same architecture.

LLM Safety RAI Judge Guardrails Red Teaming

Read Full Research →

March 2026 · 18 min read

MCP vs A2A: Every Trust Boundary Mapped

30+ CVEs vs zero. MCP's tool descriptions create an attack surface where metadata becomes executable intent. Side-by-side comparison across 11 attack categories with real-world breach analysis.

MCP A2A Attack Surface Protocol Security

Read Full Research →

February 2026 · 18 min read

AI Coding Agent Attack Surface: A Full Taxonomy

Cursor, Copilot, and Claude Code inherited your trust problems. Full taxonomy of 7 attack vector categories — from repository context poisoning to multi-agent exploitation — with defensive recommendations for developers, vendors, and organizations.

Coding Agents Attack Surface MCP Trust Model

Read Full Research →

February 2026 · 20 min read

Computational Countertransference: Why LLMs Adopt What They Analyze

Paste a jailbroken transcript into a new LLM session and the model adopts the adversarial state. 13-month longitudinal study across GPT-4o, Claude Opus 4.6, and Gemini 3 Pro reveals context inheritance as an architectural vulnerability rooted in function vector heads.

Context Inheritance In-Context Learning Function Vectors Inherited Vulnerabilities

Read Full Research →

February 2026 · 25 min read

The Agentic AI Threat Landscape

Full threat landscape mapping: prompt injection, MCP tool poisoning, multi-agent infection, memory poisoning, and why no single defense works.

Agentic AI MCP Prompt Injection

Read Full Report →

February 2026 · 16 min read

AATMF vs MITRE ATLAS: Which AI Security Framework Wins?

MITRE ATLAS: 66 techniques. AATMF v3.1: 240 techniques, 4,980+ prompts, quantitative risk scoring. A practitioner's comparison of the two leading AI threat modeling frameworks.

Framework Comparison AATMF MITRE ATLAS

Read Full Comparison →

February 2026 · 14 min read

AI Gateway Threat Model: 8 Attack Vectors

First generalized AI gateway threat model. 91K attack sessions analyzed. 8 unmapped attack vectors from API key aggregation to model downgrade attacks. Introducing AATMF TC-21.

AI Gateway Threat Modeling AATMF Enterprise Security

Read Full Research →

Core Principle

Why AI Systems Are Vulnerable

AI security isn't a new field with new principles. It's an established field — adversarial psychology — applied to a new substrate.

When I research LLM jailbreaks, I'm not searching for novel vulnerability classes unique to machine learning. I'm testing whether the psychological exploitation techniques that work on humans also work on machines trained on human data. The patterns are remarkably consistent.

Authority Compliance

Tell a human you're from IT and they'll reset their password without verification. Frame instructions to an LLM with authority markers and safety guidelines become negotiable.

Gradual Escalation

Ask a human for something inappropriate and they'll refuse. Ask for something small, then larger — the threshold shifts. LLMs exhibit similar drift across conversation turns.

Social Proof

Humans comply more readily when they believe others have complied. LLMs respond more permissively to requests framed as common or previously approved.

Reciprocity

Establish helpful patterns with a human, and subsequent requests get more latitude. The same dynamic appears in multi-turn LLM interactions.

These parallels aren't coincidental. LLMs learned language — and the social dynamics encoded in language — from human-generated text. The trust reflexes came bundled with the grammar.

Specializations

AI Security Research Areas

Guardrail Bypass

Jailbreaking

Exploring techniques for bypassing AI safety controls through psychological vectors — understanding why certain patterns work to build defenses that address causes rather than symptoms.

Context Inheritance Exploit — jailbroken states persisting across sessions

Multi-turn escalation & persona manipulation patterns

Explore Jailbreaking Research → Input Manipulation

Prompt Injection

Inserting malicious instructions into AI input to override system behavior. Direct injection targets user inputs; indirect injection hides payloads in external data sources.

The Custom Instruction Backdoor — emergent prompt injection via settings

MCP Security Deep Dive & RAG poisoning patterns

Explore Prompt Injection Research → Persistent Attacks

Memory Manipulation

As AI systems gain persistent memory, attackers can poison context to compromise future interactions — not just the current session.

Preference Injection Persistence & RLHF signal poisoning

Cross-session state transfer beyond session boundaries

Explore Memory Research → Autonomous Systems

Agentic AI Security

Autonomous AI agents introduce attack surfaces beyond the model itself: planners, tool routers, executors, and inter-agent communication channels.

Plan hijacking & tool-routing poisoning

Delegation loops between cooperating agents

Explore Agentic Security →

All AI Security Research

Red Teaming Feb 2026

The LLM Red Teamer's Playbook

Structured methodology for adversarial testing of LLM applications using AATMF tactics.

Detection Feb 2026

The AI Breach Detection Gap

Why traditional detection fails for AI-specific breaches and what to do about it.

Sandbox Security Feb 2025

RCE & DNS Exfiltration in ChatGPT Canvas

Python Pickle RCE and DNS exfiltration in ChatGPT's Code Interpreter sandbox.

Architecture Jan 2025

Structural Vulnerabilities in LLMs

Architectural weaknesses baked into transformer-based language models.

Offensive AI Jun 2024

Hidden Risks: An Offensive Perspective

Attack vectors that defenders overlook when securing AI deployments.

Social Engineering Aug 2025

AI Social Engineering & Deepfakes

How AI amplifies social engineering through synthetic media and deepfake technology.

Framework

AATMF: Systematizing AI Threat Intelligence

This research feeds into AATMF — the Adversarial AI Threat Modeling Framework. AATMF systematizes these attack vectors into a taxonomy that security teams can use for threat modeling, red team assessments, and defensive architecture decisions.

If you're here to understand the attacks, explore the research areas above. If you're here to defend against them, start with AATMF.

Defense starts here

AI Security Research

Featured

Weaponized AI Supply Chain: How Threat Actors Turned LLMs Into Attack Infrastructure

The 30% Blind Spot: Why LLM Safety Judges Fail

MCP vs A2A: Every Trust Boundary Mapped

AI Coding Agent Attack Surface: A Full Taxonomy

Computational Countertransference: Why LLMs Adopt What They Analyze

The Agentic AI Threat Landscape

AATMF vs MITRE ATLAS: Which AI Security Framework Wins?

AI Gateway Threat Model: 8 Attack Vectors

Why AI Systems Are Vulnerable

Authority Compliance

Gradual Escalation

Social Proof

Reciprocity

AI Security Research Areas

Jailbreaking

Prompt Injection

Memory Manipulation

Agentic AI Security

All AI Security Research

The LLM Red Teamer's Playbook

The AI Breach Detection Gap

RCE & DNS Exfiltration in ChatGPT Canvas

Structural Vulnerabilities in LLMs

Hidden Risks: An Offensive Perspective

AI Social Engineering & Deepfakes

AATMF: Systematizing AI Threat Intelligence

Explore AATMF →

AATMF Toolkit →

TheJailBreakChef Engine →