The effort parameter controls how much thinking Claude does per request. Choose low, medium, or high to balance cost vs. thoroughness.
API Usage
import anthropic
client = anthropic.Anthropic()
# Quick classification - low effort, fast and cheap
response = client.messages.create(
model="claude-sonnet-4-6-20260301",
max_tokens=1024,
thinking={"type": "enabled", "effort": "low"},
messages=[{"role": "user", "content": "Classify this support ticket: 'Can't log in'"}]
)
# Complex architecture review - high effort, thorough
response = client.messages.create(
model="claude-sonnet-4-6-20260301",
max_tokens=8192,
thinking={"type": "enabled", "effort": "high"},
messages=[{"role": "user", "content": "Review this microservice architecture..."}]
)
When to Use Each Level
| Effort | Use Case | Cost Impact |
|---|---|---|
low | Classification, formatting, simple Q&A | ~70% fewer thinking tokens |
medium | Most coding tasks, summaries, analysis | Balanced default |
high | Complex debugging, architecture, math | Full thinking budget |
Pro Tip
For Sonnet, medium effort is the recommended default for most use cases. Only bump to high for problems that genuinely require multi-step reasoning. This alone can cut your API costs without meaningful quality loss.