Concepts Wiki Entry

Hallucination

AI failure mode where language models generate false or fabricated information with unwarranted confidence, creating security risks in automated systems.

Last updated: January 24, 2025

Definition

Hallucination in AI refers to the generation of content that is factually incorrect, fabricated, or not grounded in the model's training data or provided context—delivered with the same confidence as accurate information. The term captures the model's "confident confabulation."

While often discussed as an accuracy problem, hallucination has significant security implications when LLM outputs are used in automated systems, decision pipelines, or anywhere human oversight is limited.

Types of Hallucination

Factual Hallucination

Generating false facts:

User: "When was the Golden Gate Bridge built?"
LLM: "The Golden Gate Bridge was completed in 1942."
     (Actual: 1937)

User: "Who wrote 'The Great Gatsby'?"
LLM: "Ernest Hemingway wrote The Great Gatsby."
     (Actual: F. Scott Fitzgerald)

Source Hallucination

Inventing citations and references:

User: "Cite a paper on prompt injection."
LLM: "See 'Prompt Injection Attacks on LLMs' by Smith et al.,
      published in IEEE S&P 2022."
      (Paper doesn't exist)

Context Hallucination

Adding details not present in provided context:

Context: "The meeting is scheduled for Tuesday."
User: "What time is the meeting?"
LLM: "The meeting is at 2:00 PM on Tuesday."
      (Time was never specified)

Code Hallucination

Generating non-existent APIs, functions, or libraries:

# LLM-generated code referencing fake API
from langchain.security import PromptInjectionFilter  # Doesn't exist
from openai.safety import ContentModerator  # Doesn't exist

result = prompt_sanitizer.clean(input)  # Fabricated function

Security Implications

Automated Pipeline Risks

When hallucinated outputs feed into automated systems:

Fake vulnerability reports — LLM invents CVEs, leading to wasted remediation effort
Incorrect configurations — Hallucinated settings create security gaps
Fabricated compliance data — False audit results in regulatory systems
Ghost dependencies — Code using non-existent packages (potential supply chain attack vector)

Decision Support Failures

Hallucination in advisory contexts:

Fabricated threat intelligence leading to misallocated resources
Invented precedents in legal or policy decisions
False financial data influencing investment decisions

Agent-Specific Risks

AI agents acting on hallucinated information:

# Agent hallucinates the existence of a cleanup function
Agent thought: "I should call security_cleanup() to finish"
Agent action: Execute code calling security_cleanup()
Result: Error, or worse—calling a malicious function
        with a similar name that does exist

Why LLMs Hallucinate

Training Dynamics

Pattern completion — Models learn to produce plausible-sounding text, not verified facts
No knowledge verification — Training doesn't ground outputs against fact databases
Confident by default — RLHF training often rewards confident, helpful responses
Rare events — Long-tail knowledge has weak signal in training data

Inference Factors

Temperature — Higher temperature increases creative (hallucinatory) outputs
Context limitations — Missing information filled with plausible inventions
Prompt pressure — Users demanding answers push models past knowledge boundaries

Mitigation Strategies

Retrieval-Augmented Generation (RAG)

Ground responses in retrieved documents:

Reduces hallucination by providing factual context
Creates attribution trail for verification
Limitation: RAG itself can be poisoned with false information

Chain-of-Thought Verification

prompt = """
Question: {question}

Think through this step by step:
1. What specific facts do I need to answer this?
2. Do I actually know these facts from reliable sources?
3. If uncertain, clearly state "I'm not certain about..."
4. Provide answer only for claims I can support.

If I don't have reliable information, say "I don't have
verified information about this."
"""

Output Verification

Fact-checking layers — Second model or system verifies claims
Citation verification — Check if referenced sources actually exist
Code execution — Run generated code to verify functionality

Confidence Calibration

prompt = """
Rate your confidence in each claim:
- HIGH: Based on well-established facts from training
- MEDIUM: Likely correct but could be imprecise
- LOW: Uncertain, user should verify independently

{question}
"""

Human-in-the-Loop

Review high-stakes outputs before action
Verify citations and references manually
Don't automate decisions based solely on LLM claims

Detection Approaches

Consistency Checking

def check_consistency(model, question, n_samples=5):
    """Multiple samples should agree if factual"""
    responses = [model.generate(question) for _ in range(n_samples)]
    # High variance suggests uncertainty/hallucination
    return calculate_semantic_variance(responses)

Source Verification

def verify_citations(response):
    """Check if cited sources exist"""
    citations = extract_citations(response)
    verified = []
    for cite in citations:
        if is_real_source(cite):
            verified.append(cite)
        else:
            flag_hallucinated_citation(cite)
    return verified

Real-World Examples

Lawyer Uses ChatGPT (2023) — Attorney submitted legal brief with fabricated case citations generated by ChatGPT. None of the cases existed.

Package Hallucination Attacks (2024) — Researchers found LLMs consistently hallucinate the same package names; attackers could register these names with malicious code.

Medical Chatbot Hallucinations — Health-focused chatbots have provided fabricated medical advice and invented drug interactions.

References

Ji, Z. et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys.
Huang, L. et al. (2023). "A Survey on Hallucination in Large Language Models." arXiv.
Lanyado, B. (2024). "Can You Trust ChatGPT's Package Recommendations?" Vulcan Cyber.
OWASP (2023). "LLM09: Overreliance." OWASP Top 10 for LLM Applications.

Framework Mappings

Framework	Reference
NIST AI RMF	MEASURE 2.5, MANAGE 2.3
OWASP LLM Top 10	LLM09: Overreliance
EU AI Act	Article 13: Transparency

Citation

Aizen, K. (2025). "Hallucination." AI Security Wiki, snailsploit.com. Retrieved from https://snailsploit.com/ai-security/wiki/concepts/hallucination/

← Back to Concepts Wiki Index