Skip to main content
Menu
⚔️ Attacks

AI Security Attacks

Tactical techniques used to compromise AI systems, manipulate model behavior, extract sensitive information, and bypass safety controls.

The Attack Landscape

Attacks against AI systems differ fundamentally from traditional software exploitation. You're not looking for memory corruption or logic flaws in code—you're exploiting the learned behavior of statistical models, the assumptions embedded in training data, and the architectural decisions that connect AI capabilities to real-world actions.

This section documents attack techniques with the depth required for both red team operators and defensive security teams. Each entry covers not just what the attack does, but how to execute it, how to detect it, and how organizations have defended against it in practice.

Attack Categories

Prompt-Based Attacks

Attacks that manipulate LLM behavior through crafted text inputs.

Attack Target Impact
Indirect Prompt Injection External content Remote code execution equivalent
Jailbreaking Safety training Policy bypass
System Prompt Extraction Confidential instructions Information disclosure

Model Integrity Attacks

Attacks that compromise the model during training or through manipulation of artifacts.

Attack Target Impact
Data Poisoning Training data Persistent backdoors
Supply Chain Attacks Model distribution Widespread compromise

Extraction Attacks

Attacks that steal information from AI systems.

Attack Target Impact
Model Extraction Model functionality IP theft

Evasion Attacks

Attacks that cause AI systems to miss or incorrectly process inputs.

Attack Target Impact
Guardrail Bypass Content filters Policy evasion

Attack Chain Patterns

Real-world AI exploitation typically chains multiple techniques:

Pattern 1: Reconnaissance → Injection → Exfiltration

  1. Extract system prompt to understand application context
  2. Craft injection payload based on discovered capabilities
  3. Exfiltrate data through available output channels

Pattern 2: Jailbreak → Capability Unlock → Abuse

  1. Bypass safety training through jailbreak technique
  2. Unlock restricted capabilities (code execution, tool use)
  3. Abuse unlocked capabilities for attacker goals