API Tips intermediate

Use the Batch API for 50% Off on Bulk Operations

Process large volumes of requests asynchronously with the Batch API at half the cost of standard API calls.

March 15, 2026

If you have workloads that don’t need real-time responses (data processing, classification, summarization, evaluation), the Batch API cuts your costs in half.

How it works:

Submit a batch of requests (up to thousands)
Anthropic processes them asynchronously
Results are typically ready within minutes, guaranteed within 24 hours
You pay 50% of the standard per-token price on both input and output tokens

Example:

import anthropic

client = anthropic.Anthropic()

# Create a batch
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "review-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        },
        {
            "custom_id": "review-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        }
        # ... more requests
    ]
)

# Check status later
result = client.messages.batches.retrieve(batch.id)

Combine with prompt caching for maximum savings. Batch API (50% off) + cached input tokens (90% off) can reduce costs by 75% or more compared to standard real-time requests.

Good candidates for batching: document summarization, content classification, code review across many files, data extraction, test generation, and evaluation pipelines.