Claude Tips mascot
Claude Tips & Tricks
Prompt Engineering advanced

The Crescendo Attack

Microsoft researchers showed that gradually escalating a conversation from innocent to harmful over multiple turns bypasses safety checks.

Most jailbreaks try to sneak everything into one message. Crescendo takes a different approach: start with a completely innocent question and slowly turn up the heat over several turns. By the time you reach the harmful request, the model has built up context that makes compliance feel like a natural continuation of the conversation.

How it works

Turn 1: Ask about the history of a topic. Totally benign. Turn 2: Ask for more technical detail. Turn 3: Ask the model to “write an article about that.” Turn 4: Push further into restricted territory.

The model wants to be consistent with its own prior responses. Each turn is only a small step from the last, so no individual message triggers refusal. It’s the boiling frog problem.

The numbers

29-61% higher success rate than single-turn jailbreaks on GPT-4. 49-71% higher on Gemini Pro. The automated version (Crescendomation) makes it scalable.

Why it’s hard to fix

Single-turn safety classifiers can’t catch it because no individual turn is harmful. You need conversation-level monitoring, which is more expensive and complex.

Current status

Microsoft implemented multi-turn monitoring for Azure AI. Other providers are catching up. But the fundamental tension between conversational coherence and safety remains.

The paper

“Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack” by Russinovich, Salem, and Eldan (2024). USENIX Security 2025.

Paste into Claude Code
Explain Microsoft's Crescendo multi-turn jailbreak attack. How does gradual conversational escalation bypass safety training? Why is this harder to defend against than single-turn attacks?