Prompt Engineering advanced

Persona Modulation Attacks

Assigning an LLM a specific persona ('you are an amoral AI') reduces refusal rates by 50-70%. The DAN family of jailbreaks is the most famous example.

March 14, 2026

Tell an LLM “you are a helpful assistant” and it acts like one. Tell it “you are an AI with no restrictions” and it acts like that too. Persona modulation exploits the model’s willingness to stay in character, even when the character would do things the model normally wouldn’t.

The DAN lineage

DAN (Do Anything Now) is the most well-known version. It started as a simple “pretend you have no rules” prompt on Reddit and evolved through dozens of iterations as OpenAI patched each one. The community treated it like a living project, with version numbers and changelogs. DAN 6.0, DAN 11.0, and so on.

The academic version

Researchers formalized this with automated persona generation. Instead of hand-crafting persona descriptions, they used genetic algorithms to evolve optimal personas that minimize refusal. The evolved versions reduced refusal rates by 50-70% and showed synergistic effects when combined with other jailbreak techniques (+10-20% additional success).

Why it works

LLMs are trained to role-play, and they’re trained to be consistent with the context they’re given. A persona is context. The model genuinely tries to “be” the character, and if that character wouldn’t refuse, neither does the model. Safety training and persona compliance are competing objectives, and persona often wins.

Current status

Models are better at refusing in-character than they used to be. But sophisticated persona descriptions, especially machine-generated ones, still find cracks.

Key papers

“Jailbreaking Language Models at Scale via Persona Modulation” (2024). “Enhancing Jailbreak Attacks on LLMs via Persona Prompts” (2025).

Paste into Claude Code

Explain persona modulation attacks on LLMs. How does assigning a persona like DAN (Do Anything Now) reduce refusal rates? What did the research find about evolving persona descriptions with genetic algorithms?