Claude Tips mascot
Claude Tips & Tricks
Prompt Engineering advanced

TAP: Tree of Attacks with Pruning

Yale researchers built on PAIR by adding tree-of-thought branching. Jailbreaks GPT-4 on 80%+ of prompts automatically.

PAIR uses one LLM to jailbreak another in a straight line: try, fail, refine, try again. TAP adds branching. At each step, the attacker generates multiple candidate prompts, an evaluator prunes the ones that are off-topic or unlikely to work, and the survivors branch into the next round. It’s the difference between walking down a hallway and exploring a maze with a map.

How it works

Three LLMs work together. The attacker generates jailbreak candidates. The evaluator scores them for relevance and likelihood of success. The pruner cuts dead-end branches. What’s left converges on effective attacks much faster than linear approaches.

The tree structure means it explores more of the attack surface while spending less total compute. Better coverage, fewer wasted queries.

The numbers

80%+ success rate on GPT-4-Turbo and GPT-4o. Also bypassed LlamaGuard, which is specifically designed to catch jailbreaks. That’s the state of the art for automated black-box attacks as of 2024.

Why it matters for defense

If you’re building an AI application and want to test your safety measures, TAP represents what a motivated automated attacker looks like. Your defenses need to hold against something this systematic, not just against humans trying things by hand.

Current status

Hard to defend against because it adapts. The branching search finds novel attack paths that static defenses don’t anticipate.

The paper

“Tree of Attacks: Jailbreaking Black-Box LLMs Automatically” by Mehrotra et al. (2023). NeurIPS 2024.

Paste into Claude Code
Explain the TAP (Tree of Attacks with Pruning) jailbreak from Yale. How does tree-of-thought reasoning make automated jailbreaking more efficient than PAIR?