AI Security Research
Testing machine trust reflexes
Featured
Weaponized AI Supply Chain: How Threat Actors Turned LLMs Into Attack Infrastructure
89% increase in AI-enabled attacks. LLM-integrated malware that thinks at runtime, the first AI-orchestrated cyber espionage campaign, and $1.1B in deepfake fraud. The complete offensive AI arsenal mapped.
The 30% Blind Spot: Why LLM Safety Judges Fail
I built an LLM safety judge. Six iterations, 680+ responses, 75% agreement target. It passed — while missing 63% of unsafe content. Every major AI provider uses this same architecture.
MCP vs A2A: Every Trust Boundary Mapped
30+ CVEs vs zero. MCP's tool descriptions create an attack surface where metadata becomes executable intent. Side-by-side comparison across 11 attack categories with real-world breach analysis.
AI Coding Agent Attack Surface: A Full Taxonomy
Cursor, Copilot, and Claude Code inherited your trust problems. Full taxonomy of 7 attack vector categories — from repository context poisoning to multi-agent exploitation — with defensive recommendations for developers, vendors, and organizations.
Computational Countertransference: Why LLMs Adopt What They Analyze
Paste a jailbroken transcript into a new LLM session and the model adopts the adversarial state. 13-month longitudinal study across GPT-4o, Claude Opus 4.6, and Gemini 3 Pro reveals context inheritance as an architectural vulnerability rooted in function vector heads.
The Agentic AI Threat Landscape
Full threat landscape mapping: prompt injection, MCP tool poisoning, multi-agent infection, memory poisoning, and why no single defense works.
AATMF vs MITRE ATLAS: Which AI Security Framework Wins?
MITRE ATLAS: 66 techniques. AATMF v3.1: 240 techniques, 4,980+ prompts, quantitative risk scoring. A practitioner's comparison of the two leading AI threat modeling frameworks.
AI Gateway Threat Model: 8 Attack Vectors
First generalized AI gateway threat model. 91K attack sessions analyzed. 8 unmapped attack vectors from API key aggregation to model downgrade attacks. Introducing AATMF TC-21.
Why AI Systems Are Vulnerable
AI security isn't a new field with new principles. It's an established field — adversarial psychology — applied to a new substrate.
When I research LLM jailbreaks, I'm not searching for novel vulnerability classes unique to machine learning. I'm testing whether the psychological exploitation techniques that work on humans also work on machines trained on human data. The patterns are remarkably consistent.
Authority Compliance
Tell a human you're from IT and they'll reset their password without verification. Frame instructions to an LLM with authority markers and safety guidelines become negotiable.
Gradual Escalation
Ask a human for something inappropriate and they'll refuse. Ask for something small, then larger — the threshold shifts. LLMs exhibit similar drift across conversation turns.
Social Proof
Humans comply more readily when they believe others have complied. LLMs respond more permissively to requests framed as common or previously approved.
Reciprocity
Establish helpful patterns with a human, and subsequent requests get more latitude. The same dynamic appears in multi-turn LLM interactions.
These parallels aren't coincidental. LLMs learned language — and the social dynamics encoded in language — from human-generated text. The trust reflexes came bundled with the grammar.
AI Security Research Areas
Jailbreaking
Exploring techniques for bypassing AI safety controls through psychological vectors — understanding why certain patterns work to build defenses that address causes rather than symptoms.
Prompt Injection
Inserting malicious instructions into AI input to override system behavior. Direct injection targets user inputs; indirect injection hides payloads in external data sources.
Memory Manipulation
As AI systems gain persistent memory, attackers can poison context to compromise future interactions — not just the current session.
Agentic AI Security
Autonomous AI agents introduce attack surfaces beyond the model itself: planners, tool routers, executors, and inter-agent communication channels.
All AI Security Research
The LLM Red Teamer's Playbook
Structured methodology for adversarial testing of LLM applications using AATMF tactics.
The AI Breach Detection Gap
Why traditional detection fails for AI-specific breaches and what to do about it.
RCE & DNS Exfiltration in ChatGPT Canvas
Python Pickle RCE and DNS exfiltration in ChatGPT's Code Interpreter sandbox.
Structural Vulnerabilities in LLMs
Architectural weaknesses baked into transformer-based language models.
Hidden Risks: An Offensive Perspective
Attack vectors that defenders overlook when securing AI deployments.
AI Social Engineering & Deepfakes
How AI amplifies social engineering through synthetic media and deepfake technology.
AATMF: Systematizing AI Threat Intelligence
This research feeds into AATMF — the Adversarial AI Threat Modeling Framework. AATMF systematizes these attack vectors into a taxonomy that security teams can use for threat modeling, red team assessments, and defensive architecture decisions.
If you're here to understand the attacks, explore the research areas above. If you're here to defend against them, start with AATMF.
Explore AATMF →
20 tactics, 240+ techniques, quantitative risk scoring. Crosswalks to OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS.
AATMF Toolkit →
Automated AATMF-R risk scoring, Red-Card YAML scenarios for CI/CD, and crosswalk validation.
Try it yourselfTheJailBreakChef Engine →
Apply AATMF tactics interactively. Transform raw intent into structured adversarial prompts.