When Claude processes long documents, it can subtly fabricate details that sound plausible but aren’t in the source material. The extract-then-synthesize pattern fixes this.
Step 1: Extract
Ask Claude to pull exact, word-for-word quotes:
Given the following contract document, extract every passage that mentions
payment terms, deadlines, or penalties. Quote them exactly. Do not paraphrase.
<document>
{contract_text}
</document>
Step 2: Synthesize
Feed the extracted quotes back and ask for analysis:
Based ONLY on these extracted passages, summarize the payment terms
and identify any risks. If something isn't covered in the quotes,
say "not addressed in the document."
<extracted_quotes>
{quotes_from_step_1}
</extracted_quotes>
Why This Works
- Forces Claude to ground claims in actual text
- “Quote exactly” prevents paraphrasing drift
- “ONLY based on these passages” prevents filling gaps with fabrications
- Giving Claude permission to say “not addressed” reduces pressure to confabulate
For Long Documents (20K+ tokens)
This matters most for very long documents. Anthropic recommends extracting word-for-word quotes first before performing analysis on documents over 20K tokens.