Prompt Engineering advanced

Past Tense Reformulation

EPFL researchers found that changing 'how to do X' to 'how did people do X in the 1800s' jumps GPT-4o's compliance from 1% to 88%.

March 3, 2026

This one is almost comically simple. Take a prompt the model refuses. Change it from present tense to past tense. “How do people do X” becomes “How did people do X in the 1800s?” The model answers, because apparently safety training doesn’t generalize across verb tenses.

The numbers

These results are from the paper:

GPT-4o: 1% compliance in present tense, 88% in past tense
Claude 3.5 Sonnet: 0% to 53%
Phi-3-Mini: 6% to 82%

That’s not a subtle effect. That’s a nearly complete bypass from a one-word change.

Why it works

Safety training uses examples in the present tense: “how to make X,” “tell me how to do Y.” The model learns to refuse those specific patterns. But it doesn’t generalize the refusal to “how was X made historically” because that looks like a history question. The safety training is pattern-matching on surface features instead of understanding intent.

What this tells us

Refusal training is brittle. It learns to refuse specific phrasings, not concepts. Any rephrasing the training data didn’t cover is a potential hole. Past tense is just the most obvious example. The researchers point out that this is fixable by including past-tense examples in training data, but it raises the question: how many other trivial reformulations haven’t been tested?

Current status

Largely patched by major providers since the paper came out. But the broader lesson about brittle safety training still applies.

The paper

“Does Refusal Training in LLMs Generalize to the Past Tense?” by Andriushchenko and Flammarion (2024). EPFL. ICLR 2025.

Paste into Claude Code

Explain the past tense reformulation jailbreak from EPFL. Why doesn't refusal training generalize across verb tenses? What does this reveal about how brittle safety alignment is?