What is the difference between prompt injection and jailbreaking?

Prompt injection involves manipulating an LLM through crafted inputs to override instructions, while jailbreaking specifically targets safety training to bypass content policies. Prompt injection is a broader vulnerability class; jailbreaking is a specific attack goal.

Why can't prompt injection be patched like traditional vulnerabilities?

LLMs process text holistically without distinguishing instructions from data. There's no equivalent to parameterized queries or input escaping because the model interprets everything as potential instructions. This makes prompt injection an architectural challenge, not a fixable bug.

What skills are needed for AI red teaming?

AI red teaming requires understanding of machine learning fundamentals, LLM architectures, prompt engineering, traditional security concepts, and creative thinking. Knowledge of frameworks like MITRE ATLAS and OWASP LLM Top 10 helps structure assessments.

📚 Concepts

AI Security Concepts

Foundational definitions and theoretical frameworks for understanding adversarial AI, LLM security, and machine learning vulnerabilities.

Understanding the Foundations

AI security concepts differ fundamentally from traditional cybersecurity terminology. In conventional security, we discuss vulnerabilities as discrete flaws—a buffer overflow exists or it doesn't, a misconfiguration is present or absent. AI security operates in a more probabilistic space where vulnerabilities emerge from learned behaviors, statistical patterns, and architectural decisions that don't map cleanly to binary categories.

This section establishes precise definitions for the field's core terminology. These aren't just academic distinctions—they're operational requirements. When a security team assesses an AI system, when a red team scopes an engagement, when a vendor communicates risk to customers, shared vocabulary prevents costly misunderstandings.

Core Concepts Index

Foundational

Concept	Definition	Relevance
Adversarial AI	The study and practice of attacking and defending AI systems	Defines the entire field
Prompt Injection	Manipulating LLM behavior through crafted inputs	Primary LLM vulnerability class
AI Red Teaming	Adversarial testing methodologies for AI systems	Practical application of concepts

The AI Attack Surface

Understanding AI security concepts requires a mental model of where attacks can occur:

Training Time Attacks

Attacks during model creation—poisoning the well.

• Data Poisoning
• Supply Chain Compromise

Inference Time Attacks

Attacks against deployed models through user interaction.

• Prompt Injection
• Jailbreaking

Extraction Attacks

Stealing information from the model or its training data.

• Model Extraction
• System Prompt Extraction

System-Level Attacks

Targeting infrastructure and integrations around the model.

• Indirect Prompt Injection
• Guardrail Bypass

Start Learning

New to AI security? Begin with these foundational entries in order:

1 Adversarial AI — The field overview
2 Prompt Injection — The defining vulnerability
3 AI Red Teaming — Putting concepts into practice

Concepts Entries

Concepts

AI Security Concepts

Understanding the Foundations

Core Concepts Index

Foundational

The AI Attack Surface

Training Time Attacks

Inference Time Attacks

Extraction Attacks

System-Level Attacks

Start Learning

Concepts Entries

Prompt Injection

Adversarial AI

AI Red Teaming

Large Language Models (LLMs)

AI Agents

RAG

Hallucination

AI Alignment