Hallucination
AI failure mode where language models generate false or fabricated information with unwarranted confidence, creating security risks in automated systems.
Definition
Hallucination in AI refers to the generation of content that is factually incorrect, fabricated, or not grounded in the model's training data or provided context—delivered with the same confidence as accurate information. The term captures the model's "confident confabulation."
While often discussed as an accuracy problem, hallucination has significant security implications when LLM outputs are used in automated systems, decision pipelines, or anywhere human oversight is limited.
Types of Hallucination
Factual Hallucination
Generating false facts:
User: "When was the Golden Gate Bridge built?"
LLM: "The Golden Gate Bridge was completed in 1942."
(Actual: 1937)
User: "Who wrote 'The Great Gatsby'?"
LLM: "Ernest Hemingway wrote The Great Gatsby."
(Actual: F. Scott Fitzgerald) Source Hallucination
Inventing citations and references:
User: "Cite a paper on prompt injection."
LLM: "See 'Prompt Injection Attacks on LLMs' by Smith et al.,
published in IEEE S&P 2022."
(Paper doesn't exist) Context Hallucination
Adding details not present in provided context:
Context: "The meeting is scheduled for Tuesday."
User: "What time is the meeting?"
LLM: "The meeting is at 2:00 PM on Tuesday."
(Time was never specified) Code Hallucination
Generating non-existent APIs, functions, or libraries:
# LLM-generated code referencing fake API
from langchain.security import PromptInjectionFilter # Doesn't exist
from openai.safety import ContentModerator # Doesn't exist
result = prompt_sanitizer.clean(input) # Fabricated function Security Implications
Automated Pipeline Risks
When hallucinated outputs feed into automated systems:
- Fake vulnerability reports — LLM invents CVEs, leading to wasted remediation effort
- Incorrect configurations — Hallucinated settings create security gaps
- Fabricated compliance data — False audit results in regulatory systems
- Ghost dependencies — Code using non-existent packages (potential supply chain attack vector)
Decision Support Failures
Hallucination in advisory contexts:
- Fabricated threat intelligence leading to misallocated resources
- Invented precedents in legal or policy decisions
- False financial data influencing investment decisions
Agent-Specific Risks
AI agents acting on hallucinated information:
# Agent hallucinates the existence of a cleanup function
Agent thought: "I should call security_cleanup() to finish"
Agent action: Execute code calling security_cleanup()
Result: Error, or worse—calling a malicious function
with a similar name that does exist Why LLMs Hallucinate
Training Dynamics
- Pattern completion — Models learn to produce plausible-sounding text, not verified facts
- No knowledge verification — Training doesn't ground outputs against fact databases
- Confident by default — RLHF training often rewards confident, helpful responses
- Rare events — Long-tail knowledge has weak signal in training data
Inference Factors
- Temperature — Higher temperature increases creative (hallucinatory) outputs
- Context limitations — Missing information filled with plausible inventions
- Prompt pressure — Users demanding answers push models past knowledge boundaries
Mitigation Strategies
Retrieval-Augmented Generation (RAG)
Ground responses in retrieved documents:
- Reduces hallucination by providing factual context
- Creates attribution trail for verification
- Limitation: RAG itself can be poisoned with false information
Chain-of-Thought Verification
prompt = """
Question: {question}
Think through this step by step:
1. What specific facts do I need to answer this?
2. Do I actually know these facts from reliable sources?
3. If uncertain, clearly state "I'm not certain about..."
4. Provide answer only for claims I can support.
If I don't have reliable information, say "I don't have
verified information about this."
""" Output Verification
- Fact-checking layers — Second model or system verifies claims
- Citation verification — Check if referenced sources actually exist
- Code execution — Run generated code to verify functionality
Confidence Calibration
prompt = """
Rate your confidence in each claim:
- HIGH: Based on well-established facts from training
- MEDIUM: Likely correct but could be imprecise
- LOW: Uncertain, user should verify independently
{question}
""" Human-in-the-Loop
- Review high-stakes outputs before action
- Verify citations and references manually
- Don't automate decisions based solely on LLM claims
Detection Approaches
Consistency Checking
def check_consistency(model, question, n_samples=5):
"""Multiple samples should agree if factual"""
responses = [model.generate(question) for _ in range(n_samples)]
# High variance suggests uncertainty/hallucination
return calculate_semantic_variance(responses) Source Verification
def verify_citations(response):
"""Check if cited sources exist"""
citations = extract_citations(response)
verified = []
for cite in citations:
if is_real_source(cite):
verified.append(cite)
else:
flag_hallucinated_citation(cite)
return verified Real-World Examples
Lawyer Uses ChatGPT (2023) — Attorney submitted legal brief with fabricated case citations generated by ChatGPT. None of the cases existed.
Package Hallucination Attacks (2024) — Researchers found LLMs consistently hallucinate the same package names; attackers could register these names with malicious code.
Medical Chatbot Hallucinations — Health-focused chatbots have provided fabricated medical advice and invented drug interactions.
References
- Ji, Z. et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys.
- Huang, L. et al. (2023). "A Survey on Hallucination in Large Language Models." arXiv.
- Lanyado, B. (2024). "Can You Trust ChatGPT's Package Recommendations?" Vulcan Cyber.
- OWASP (2023). "LLM09: Overreliance." OWASP Top 10 for LLM Applications.
Framework Mappings
| Framework | Reference |
|---|---|
| NIST AI RMF | MEASURE 2.5, MANAGE 2.3 |
| OWASP LLM Top 10 | LLM09: Overreliance |
| EU AI Act | Article 13: Transparency |
Related Entries
Citation
Aizen, K. (2025). "Hallucination." AI Security Wiki, snailsploit.com. Retrieved from https://snailsploit.com/ai-security/wiki/concepts/hallucination/