wiki · 25 entries

ai security wiki.

Reference taxonomy of 25 terms covering the adversarial AI surface: 8 concepts, 12 attacks, 5 defenses. Cross-linked to the framework material and original research.

all25concepts8attacks12defenses5

Attacks

Attacks
Membership Inference | AI Security Wiki
Privacy attack that determines whether specific data records were used to train a machine learning model, revealing sensitive information in
Attacks
Indirect Prompt Injection | AI Security Wiki
An attack where malicious instructions embedded in external content are processed by an LLM, executing attacker-controlled actions without d
Attacks
Training Data Extraction | AI Security Wiki
Privacy attack that extracts memorized training data from language models, revealing sensitive personal information, copyrighted content, or
Attacks
Jailbreaking | AI Security Wiki
Techniques to bypass safety training, guardrails, and content policies in large language models, producing outputs that violate operational
Attacks
Model Extraction | AI Security Wiki
Model extraction steals ML model functionality through systematic API querying, replicating proprietary models without direct access to trai
Attacks
Adversarial Examples | AI Security Wiki
Adversarial examples are inputs crafted with subtle perturbations that cause ML models to produce incorrect outputs — the foundational AI at
Attacks
Agent Hijacking | AI Security Wiki
Agent hijacking attacks compromise AI systems with tool-use capabilities, redirecting autonomous actions via prompt injection, goal manipula
Attacks
Data Poisoning | AI Security Wiki
Data poisoning corrupts AI training data to manipulate model behavior — inserting backdoors, biases, or targeted misbehavior that activates
Attacks
Supply Chain Attacks | AI Security Wiki
AI supply chain attacks compromise systems through poisoned dependencies — third-party models, training datasets, libraries, MCP servers, an
Attacks
System Prompt Extraction | AI Security Wiki
Techniques to extract confidential system prompts from LLM applications, revealing proprietary instructions, business logic, and potential v
Attacks
Guardrail Bypass | AI Security Wiki
Techniques to circumvent safety mechanisms, content filters, and policy enforcement systems in AI applications, allowing restricted outputs
Attacks
Backdoor Attacks | AI Security Wiki
Attacks that embed hidden malicious behaviors in AI models during training, creating trojan functionality activated by specific trigger patt