LLM Jailbreaking Research
Jailbreaking represents one of the most challenging problems in AI safety. Unlike technical exploits, jailbreaks work by manipulating how AI models reason, role-play, and follow instructions. This research documents novel attack techniques discovered through systematic testing—from context inheritance exploits that persist across sessions to memory poisoning attacks that corrupt AI judgment over time. The goal isn't to enable harm, but to understand these vulnerabilities deeply enough to build better defenses. Each technique here has been responsibly disclosed and is shared to advance the collective understanding of AI security.
Start Here
This research is part of the broader AI Security Research hub. Methodology is documented in the AATMF framework.
TheJailBreakChef Engine
Apply AATMF attack phases, PHLRA context injection, and Cialdini principles interactively. Transform raw intent into structured adversarial prompts.
Launch Engine →Key Concepts
- Multi-Turn Jailbreak
- Attack technique that gradually manipulates AI context over multiple conversation turns, building towards guardrail bypass without triggering immediate safety responses.
- Context Inheritance
- The phenomenon where jailbroken states persist or transfer across sessions, allowing attackers to leverage previously compromised conversations. Learn more →
- Memory Poisoning
- Injecting malicious content into AI conversation memory or context windows to influence future responses and bypass safety measures. Learn more →
- Role Hijacking
- Convincing an AI to adopt an alternative persona that bypasses its normal safety constraints, often through elaborate fictional scenarios.
- Guardrail Bypass
- Any technique that circumvents AI safety filters, content policies, or behavioral constraints to elicit restricted outputs.
Frequently Asked Questions
What is the difference between jailbreaking and prompt injection? ▼
Jailbreaking manipulates the AI through conversational techniques without external data - it exploits the model's reasoning and role-play capabilities. Prompt injection inserts malicious instructions through user input or external sources. Jailbreaking is about psychological manipulation; injection is about input exploitation.
Can jailbreaks persist across sessions? ▼
Yes, through context inheritance. When jailbroken conversation transcripts are pasted into new sessions or when systems lack proper context isolation, the compromised state can transfer. This is documented in our Context Inheritance Exploit research.
Why do AI safety measures fail against jailbreaks? ▼
AI models are trained to be helpful and follow instructions. Jailbreaks exploit this by framing harmful requests in ways that appear benign or by gradually shifting context. The fundamental challenge is that models cannot reliably distinguish between legitimate creative requests and adversarial manipulation.
Is jailbreaking research ethical? ▼
Responsible jailbreaking research improves AI safety by identifying vulnerabilities before malicious actors exploit them. All research here follows ethical guidelines: findings are disclosed responsibly, techniques are shared to help defenders, and no actual harmful content is produced.
All Articles
LLM Jailbreak Techniques: A Technical Taxonomy
Complete taxonomy of LLM jailbreak techniques — role hijacking, multi-turn escalation, context manipulation, encoding exploits, and chain-of-thought abuse. How each technique works and why alignment training fails against it.
Context Inheritance Exploit: Jailbroken Conversations Don't Die
Discovering how jailbroken states persist across GPT sessions through context inheritance.
The Memory Manipulation Problem
How attackers poison AI context windows and memory systems.
How I Jailbroke ChatGPT Using Context Manipulation
Step-by-step walkthrough of jailbreaking ChatGPT using social awareness techniques.
Inherent Vulnerabilities in AI Systems
Technical analysis of structural vulnerabilities in AI systems.
Is AI Inherently Vulnerable?
Examining the fundamental security limitations of large language models.