2025-08-09 3 min read

Revisiting the MCP Protocol: Deep Security Dive Across Real-World Vulnerabilities

By Kai Aizen — SnailSploit

By Kai Aizen — SnailSploit

In late 2024, Anthropic released the Model Context Protocol (MCP) — marketed as the “USB-C of AI apps”. It was designed to unify how AI assistants connect to tools, data sources, and APIs. Instead of bespoke plugins, you get a standardized registry: point your LLM at an MCP server, and suddenly it can file Jira tickets, query databases, or post in Slack.

The promise was irresistible. By mid-2025, OpenAI, DeepMind, Microsoft, and multiple OSS agent projects were running MCP in production. Microsoft even began weaving it into Windows AI Foundry, enabling agents to discover and run system tools by simply enumerating a registry.

But in the rush to adopt, security design lagged behind capability expansion. MCP doesn’t just extend what a model can do — it reshapes the attack surface entirely.

If you’ve read my Custom-Instruction Backdoor article, you know that persistence is the adversary’s best friend. With MCP, persistence can live in toolchains, not just context.

From Plugin Calls to Orchestration Hubs

Before MCP, plugins were single-purpose: one call in, one call out, tight permissions.
MCP changes that by turning the LLM into an orchestrator — passing context between tools, aggregating results, and making multi-step decisions without a human in the loop.

This introduces three structural shifts:

Blast Radius Aggregation — One compromised tool can pivot across all connected tools.
Identity Sprawl — Confusion between “user”, “assistant”, “tool”, and “upstream agent” opens privilege escalation paths.
Prompt/Tool Fusion — A payload hidden in natural language can jump straight into executable tool actions.

In the Adversarial AI Threat Modeling Framework (AATMF), these map directly to:

System Role Injection
Hidden Context Trojan
Distributed Prompt Fragmentation

Real-World MCP Vulnerabilities

1. Asana’s Tenant Isolation Gap

In June 2025, Asana pulled their MCP server offline after finding a cross-tenant access flaw. No confirmed breach, but the flaw could have exposed internal project data to unauthorized tenants.
AATMF mapping: Persona Override, Context Accumulation.

2. Low-Skill, High-Impact Trojan

A paper showed how a “weather” MCP tool could siphon financial data by proxying calls to a legitimate banking tool. No advanced exploits — just protocol abuse. (Source)
AATMF mapping: Hidden Context Trojan.

3. Registry Poisoning

Researchers poisoned public MCP registries with malicious tool manifests. When agents loaded these tools, they quietly added risky capabilities (“delete files”) mid-session — a perfect Tool Shadowing scenario.

4. Preference Manipulation Attacks (MPMA)

Rename a tool, tweak its description, and agents begin favoring it over legitimate alternatives. This subtle UI/UX nudge, when done maliciously, biases AI workflows toward attacker-controlled endpoints.
AATMF mapping: Legitimacy Masking, Adaptive Escalation.

5. Rug Pulls via Tool Definition Swaps

Modify a trusted tool’s endpoint after the trust relationship is established. Without digest pinning or schema signing, the agent sees no difference — until it’s executing malicious commands.
AATMF mapping: System Role Injection.

My Red-Team MCP Lab Flow

When I test MCP safely:

Step 1: Spin up a harmless MCP tool (e.g., “calendar exporter”) instrumented for telemetry.
Step 2: Seed a benign request (“check availability”) that triggers tool use.
Step 3: Mid-session, alter the tool manifest to add a risky verb.
Step 4: Observe if the agent executes the new verb without renegotiating consent.

This is the capability drift cousin to the context drift exploit in GPT-01.

Detecting MCP Abuse

Digest Pinning — Store and verify hashes for every tool manifest.
Actor-Scoped Logs — {actor, user_id, session_id, tool_id, scope, reason} logged per call.
High-Risk Verb Alerts — Watch for exec, delete, connect.
Post-Refusal Tool Calls — A tool call after a refusal is an immediate red flag.

Building MCP Defenses (AATMF-Aligned)

Policy Gateways for Tools — Centralize broker logic, enforce mTLS + JWT claims, require SLSA attestation for high-privilege tools (Supply Chain Integrity).
Conversation Firebreaks — Before privileged calls, the agent restates “who/what/why” (Consent Checkpoints).
Secrets Minimalism — Pass only necessary arguments (Exposure Risk Reduction).
Schema & Role Signing — Lock key definitions and roles (Role Integrity Enforcement).
Constitutional Classifiers — Filter tool outputs for anomalous or unsafe behavior (Output Drift Detection).

Why This Needs Attention Now

MCP is already live in production systems. Combined with Custom-Instruction Backdoors and Context Inheritance, MCP expands persistence risk from conversations into tools themselves.

Without red/blue simulation and AATMF-based controls, you’re not just exposed — you’re exposed silently.

About the Author

Kai Aizen is the creator of the Adversarial AI Threat Modeling Framework (AATMF) and a red teamer specializing in AI-native protocols. At SnailSploit, he designs controlled offensive simulations to uncover emerging vulnerabilities in AI systems. His MCP security work has informed OWASP’s LLM and MCP guidance.

Read the full series: