Attacks
Wiki Entry
Data Poisoning
An attack that corrupts training data to manipulate model behavior, potentially inserting backdoors, biases, or targeted misbehavior into deployment.
Last updated: January 24, 2025
Definition
Data poisoning attacks corrupt the training data used to build machine learning models. By injecting malicious samples into training datasets, attackers can cause models to learn incorrect patterns, exhibit biased behavior, or contain hidden backdoors that activate under specific conditions.
Attack Variants
Backdoor Insertion
Training models to exhibit specific behavior when triggered:
- Model behaves normally until a specific trigger is present
- Trigger activates attacker-chosen behavior (misclassification, specific output)
- Examples: specific phrases, pixel patterns, metadata
Targeted Poisoning
Causing misclassification of specific inputs while maintaining general accuracy.
Model Degradation
Reducing overall model performance through noise injection.
Bias Amplification
Exaggerating or introducing biases in model outputs.
Attack Vectors
- Web scraping — Attacker-controlled content in crawled datasets
- Crowdsourced data — Malicious contributions to labeling platforms
- Public datasets — Compromised widely-used training corpora
- Fine-tuning data — Poisoning adaptation datasets
- Federated learning — Malicious participant contributions
Why It's Dangerous
- Persistence — Backdoors survive through deployment and updates
- Stealth — Poisoned models may pass standard evaluations
- Scale — Popular datasets affect many downstream models
- Supply chain impact — Foundational models spread poison
Detection
- Statistical analysis of training data distributions
- Trigger detection through activation analysis
- Model behavior testing on holdout datasets
- Provenance tracking for training data
Defenses
- Data sanitization — Filter anomalous samples
- Robust training — Use techniques resistant to poisoning
- Differential privacy — Limit influence of individual samples
- Model verification — Test for backdoors before deployment
- Supply chain security — Verify data and model provenance
References
- Gu, T. et al. (2017). "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain."
- Carlini, N. et al. (2023). "Poisoning Web-Scale Training Datasets is Practical."
Framework Mappings
| Framework | Reference |
|---|---|
| MITRE ATLAS | AML.T0020: Poison Training Data |
| OWASP LLM Top 10 | LLM03: Training Data Poisoning |
| AATMF | DP-* (Data Poisoning category) |
Related Entries
Citation
Aizen, K. (2025). "Data Poisoning." AI Security Wiki, snailsploit.com. Retrieved from https://snailsploit.com/ai-security/wiki/attacks/data-poisoning/