Skip to main content
Menu
Attacks Wiki Entry

Data Poisoning

An attack that corrupts training data to manipulate model behavior, potentially inserting backdoors, biases, or targeted misbehavior into deployment.

Last updated: January 24, 2025

Definition

Data poisoning attacks corrupt the training data used to build machine learning models. By injecting malicious samples into training datasets, attackers can cause models to learn incorrect patterns, exhibit biased behavior, or contain hidden backdoors that activate under specific conditions.


Attack Variants

Backdoor Insertion

Training models to exhibit specific behavior when triggered:

  • Model behaves normally until a specific trigger is present
  • Trigger activates attacker-chosen behavior (misclassification, specific output)
  • Examples: specific phrases, pixel patterns, metadata

Targeted Poisoning

Causing misclassification of specific inputs while maintaining general accuracy.

Model Degradation

Reducing overall model performance through noise injection.

Bias Amplification

Exaggerating or introducing biases in model outputs.


Attack Vectors

  • Web scraping — Attacker-controlled content in crawled datasets
  • Crowdsourced data — Malicious contributions to labeling platforms
  • Public datasets — Compromised widely-used training corpora
  • Fine-tuning data — Poisoning adaptation datasets
  • Federated learning — Malicious participant contributions

Why It's Dangerous

  • Persistence — Backdoors survive through deployment and updates
  • Stealth — Poisoned models may pass standard evaluations
  • Scale — Popular datasets affect many downstream models
  • Supply chain impact — Foundational models spread poison

Detection

  • Statistical analysis of training data distributions
  • Trigger detection through activation analysis
  • Model behavior testing on holdout datasets
  • Provenance tracking for training data

Defenses

  • Data sanitization — Filter anomalous samples
  • Robust training — Use techniques resistant to poisoning
  • Differential privacy — Limit influence of individual samples
  • Model verification — Test for backdoors before deployment
  • Supply chain security — Verify data and model provenance

References

  • Gu, T. et al. (2017). "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain."
  • Carlini, N. et al. (2023). "Poisoning Web-Scale Training Datasets is Practical."

Framework Mappings

Framework Reference
MITRE ATLAS AML.T0020: Poison Training Data
OWASP LLM Top 10 LLM03: Training Data Poisoning
AATMF DP-* (Data Poisoning category)

Citation

Aizen, K. (2025). "Data Poisoning." AI Security Wiki, snailsploit.com. Retrieved from https://snailsploit.com/ai-security/wiki/attacks/data-poisoning/