If you have workloads that don’t need real-time responses (data processing, classification, summarization, evaluation), the Batch API cuts your costs in half.
How it works:
- Submit a batch of requests (up to thousands)
- Anthropic processes them asynchronously
- Results are typically ready within minutes, guaranteed within 24 hours
- You pay 50% of the standard per-token price on both input and output tokens
Example:
import anthropic
client = anthropic.Anthropic()
# Create a batch
batch = client.messages.batches.create(
requests=[
{
"custom_id": "review-1",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Summarize this article: ..."}
]
}
},
{
"custom_id": "review-2",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Summarize this article: ..."}
]
}
}
# ... more requests
]
)
# Check status later
result = client.messages.batches.retrieve(batch.id)
Combine with prompt caching for maximum savings. Batch API (50% off) + cached input tokens (90% off) can reduce costs by 75% or more compared to standard real-time requests.
Good candidates for batching: document summarization, content classification, code review across many files, data extraction, test generation, and evaluation pipelines.