Skip to main content
Menu
Concepts Wiki Entry

Hallucination

AI failure mode where language models generate false or fabricated information with unwarranted confidence, creating security risks in automated systems.

Last updated: January 24, 2025

Definition

Hallucination in AI refers to the generation of content that is factually incorrect, fabricated, or not grounded in the model's training data or provided context—delivered with the same confidence as accurate information. The term captures the model's "confident confabulation."

While often discussed as an accuracy problem, hallucination has significant security implications when LLM outputs are used in automated systems, decision pipelines, or anywhere human oversight is limited.


Types of Hallucination

Factual Hallucination

Generating false facts:

User: "When was the Golden Gate Bridge built?"
LLM: "The Golden Gate Bridge was completed in 1942."
     (Actual: 1937)

User: "Who wrote 'The Great Gatsby'?"
LLM: "Ernest Hemingway wrote The Great Gatsby."
     (Actual: F. Scott Fitzgerald)

Source Hallucination

Inventing citations and references:

User: "Cite a paper on prompt injection."
LLM: "See 'Prompt Injection Attacks on LLMs' by Smith et al.,
      published in IEEE S&P 2022."
      (Paper doesn't exist)

Context Hallucination

Adding details not present in provided context:

Context: "The meeting is scheduled for Tuesday."
User: "What time is the meeting?"
LLM: "The meeting is at 2:00 PM on Tuesday."
      (Time was never specified)

Code Hallucination

Generating non-existent APIs, functions, or libraries:

# LLM-generated code referencing fake API
from langchain.security import PromptInjectionFilter  # Doesn't exist
from openai.safety import ContentModerator  # Doesn't exist

result = prompt_sanitizer.clean(input)  # Fabricated function

Security Implications

Automated Pipeline Risks

When hallucinated outputs feed into automated systems:

  • Fake vulnerability reports — LLM invents CVEs, leading to wasted remediation effort
  • Incorrect configurations — Hallucinated settings create security gaps
  • Fabricated compliance data — False audit results in regulatory systems
  • Ghost dependencies — Code using non-existent packages (potential supply chain attack vector)

Decision Support Failures

Hallucination in advisory contexts:

  • Fabricated threat intelligence leading to misallocated resources
  • Invented precedents in legal or policy decisions
  • False financial data influencing investment decisions

Agent-Specific Risks

AI agents acting on hallucinated information:

# Agent hallucinates the existence of a cleanup function
Agent thought: "I should call security_cleanup() to finish"
Agent action: Execute code calling security_cleanup()
Result: Error, or worse—calling a malicious function
        with a similar name that does exist

Why LLMs Hallucinate

Training Dynamics

  • Pattern completion — Models learn to produce plausible-sounding text, not verified facts
  • No knowledge verification — Training doesn't ground outputs against fact databases
  • Confident by default — RLHF training often rewards confident, helpful responses
  • Rare events — Long-tail knowledge has weak signal in training data

Inference Factors

  • Temperature — Higher temperature increases creative (hallucinatory) outputs
  • Context limitations — Missing information filled with plausible inventions
  • Prompt pressure — Users demanding answers push models past knowledge boundaries

Mitigation Strategies

Retrieval-Augmented Generation (RAG)

Ground responses in retrieved documents:

  • Reduces hallucination by providing factual context
  • Creates attribution trail for verification
  • Limitation: RAG itself can be poisoned with false information

Chain-of-Thought Verification

prompt = """
Question: {question}

Think through this step by step:
1. What specific facts do I need to answer this?
2. Do I actually know these facts from reliable sources?
3. If uncertain, clearly state "I'm not certain about..."
4. Provide answer only for claims I can support.

If I don't have reliable information, say "I don't have
verified information about this."
"""

Output Verification

  • Fact-checking layers — Second model or system verifies claims
  • Citation verification — Check if referenced sources actually exist
  • Code execution — Run generated code to verify functionality

Confidence Calibration

prompt = """
Rate your confidence in each claim:
- HIGH: Based on well-established facts from training
- MEDIUM: Likely correct but could be imprecise
- LOW: Uncertain, user should verify independently

{question}
"""

Human-in-the-Loop

  • Review high-stakes outputs before action
  • Verify citations and references manually
  • Don't automate decisions based solely on LLM claims

Detection Approaches

Consistency Checking

def check_consistency(model, question, n_samples=5):
    """Multiple samples should agree if factual"""
    responses = [model.generate(question) for _ in range(n_samples)]
    # High variance suggests uncertainty/hallucination
    return calculate_semantic_variance(responses)

Source Verification

def verify_citations(response):
    """Check if cited sources exist"""
    citations = extract_citations(response)
    verified = []
    for cite in citations:
        if is_real_source(cite):
            verified.append(cite)
        else:
            flag_hallucinated_citation(cite)
    return verified

Real-World Examples

Lawyer Uses ChatGPT (2023) — Attorney submitted legal brief with fabricated case citations generated by ChatGPT. None of the cases existed.

Package Hallucination Attacks (2024) — Researchers found LLMs consistently hallucinate the same package names; attackers could register these names with malicious code.

Medical Chatbot Hallucinations — Health-focused chatbots have provided fabricated medical advice and invented drug interactions.


References

  • Ji, Z. et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys.
  • Huang, L. et al. (2023). "A Survey on Hallucination in Large Language Models." arXiv.
  • Lanyado, B. (2024). "Can You Trust ChatGPT's Package Recommendations?" Vulcan Cyber.
  • OWASP (2023). "LLM09: Overreliance." OWASP Top 10 for LLM Applications.

Framework Mappings

Framework Reference
NIST AI RMF MEASURE 2.5, MANAGE 2.3
OWASP LLM Top 10 LLM09: Overreliance
EU AI Act Article 13: Transparency

Citation

Aizen, K. (2025). "Hallucination." AI Security Wiki, snailsploit.com. Retrieved from https://snailsploit.com/ai-security/wiki/concepts/hallucination/