API Tips advanced

Save 90% on Repeated Prompts with Prompt Caching

Cache your system prompts and reference documents to slash API costs. Cached reads cost just 10% of the normal input token price.

March 15, 2026

If you send the same system prompt or large context repeatedly (e.g., in a chatbot or pipeline), prompt caching can cut your input token costs by up to 90%.

How it works:

Cache write (first request): Costs 1.25x the base input price (5-min TTL) or 2x (1-hour TTL)
Cache read (subsequent requests): Costs only 0.1x the base input price
The cache breaks even after just one read for 5-minute caching, or two reads for 1-hour caching

Example with Claude Sonnet 4 ($3/MTok base):

Operation	Cost per MTok
Standard input	$3.00
5-min cache write	$3.75
1-hour cache write	$6.00
Cache read	$0.30

Implementation:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a legal contract analyzer...",  # your long system prompt
            "cache_control": {"type": "ephemeral"}  # enables 5-min caching
        }
    ],
    messages=[{"role": "user", "content": "Analyze this contract..."}]
)

Best candidates for caching: System prompts, few-shot examples, reference documentation, and any large static context that stays the same across requests. Combine with the Batch API (50% off) for maximum savings.