AI Security Concepts
Foundational definitions and theoretical frameworks for understanding adversarial AI, LLM security, and machine learning vulnerabilities.
Understanding the Foundations
AI security concepts differ fundamentally from traditional cybersecurity terminology. In conventional security, we discuss vulnerabilities as discrete flaws—a buffer overflow exists or it doesn't, a misconfiguration is present or absent. AI security operates in a more probabilistic space where vulnerabilities emerge from learned behaviors, statistical patterns, and architectural decisions that don't map cleanly to binary categories.
This section establishes precise definitions for the field's core terminology. These aren't just academic distinctions—they're operational requirements. When a security team assesses an AI system, when a red team scopes an engagement, when a vendor communicates risk to customers, shared vocabulary prevents costly misunderstandings.
Core Concepts Index
Foundational
| Concept | Definition | Relevance |
|---|---|---|
| Adversarial AI | The study and practice of attacking and defending AI systems | Defines the entire field |
| Prompt Injection | Manipulating LLM behavior through crafted inputs | Primary LLM vulnerability class |
| AI Red Teaming | Adversarial testing methodologies for AI systems | Practical application of concepts |
The AI Attack Surface
Understanding AI security concepts requires a mental model of where attacks can occur:
Training Time Attacks
Attacks during model creation—poisoning the well.
Inference Time Attacks
Attacks against deployed models through user interaction.
Extraction Attacks
Stealing information from the model or its training data.
System-Level Attacks
Targeting infrastructure and integrations around the model.
Start Learning
New to AI security? Begin with these foundational entries in order:
- 1 Adversarial AI — The field overview
- 2 Prompt Injection — The defining vulnerability
- 3 AI Red Teaming — Putting concepts into practice
Concepts Entries
Prompt Injection
A vulnerability class where untrusted input causes LLMs to deviate from intended instructions, executing attacker-controlled directives.
Read more → ConceptsAdversarial AI
The study and practice of manipulating AI systems through carefully crafted inputs and exploiting learned behaviors.
Read more → ConceptsAI Red Teaming
Systematic adversarial testing of AI systems to identify vulnerabilities before malicious actors do.
Read more → ConceptsLarge Language Models (LLMs)
Foundation AI models trained on massive text datasets that generate human-like text and power modern AI applications.
Read more → ConceptsAI Agents
Autonomous AI systems that can plan, execute actions, use tools, and interact with external systems to accomplish goals.
Read more → ConceptsRAG
Retrieval-Augmented Generation architecture that enhances LLM responses by retrieving relevant documents from external knowledge bases.
Read more → ConceptsHallucination
AI failure mode where language models generate false, fabricated, or misleading information with unwarranted confidence.
Read more → ConceptsAI Alignment
The challenge of ensuring AI systems reliably pursue intended goals and behave according to human values.
Read more →