Claude Tips mascot
Claude Tips & Tricks
API Tips intermediate

Use the Batch API for 50% Off on Bulk Operations

Process large volumes of requests asynchronously with the Batch API at half the cost of standard API calls.

If you have workloads that don’t need real-time responses (data processing, classification, summarization, evaluation), the Batch API cuts your costs in half.

How it works:

  1. Submit a batch of requests (up to thousands)
  2. Anthropic processes them asynchronously
  3. Results are typically ready within minutes, guaranteed within 24 hours
  4. You pay 50% of the standard per-token price on both input and output tokens

Example:

import anthropic

client = anthropic.Anthropic()

# Create a batch
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "review-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        },
        {
            "custom_id": "review-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        }
        # ... more requests
    ]
)

# Check status later
result = client.messages.batches.retrieve(batch.id)

Combine with prompt caching for maximum savings. Batch API (50% off) + cached input tokens (90% off) can reduce costs by 75% or more compared to standard real-time requests.

Good candidates for batching: document summarization, content classification, code review across many files, data extraction, test generation, and evaluation pipelines.