API Tips intermediate

Use the Effort Parameter to Control Cost and Speed

The effort parameter lets you dial Claude's thinking depth up or down per request, trading thoroughness for speed and token savings.

March 15, 2026

The effort parameter controls how much thinking Claude does per request. Choose low, medium, or high to balance cost vs. thoroughness.

API Usage

import anthropic
client = anthropic.Anthropic()

# Quick classification - low effort, fast and cheap
response = client.messages.create(
    model="claude-sonnet-4-6-20260301",
    max_tokens=1024,
    thinking={"type": "enabled", "effort": "low"},
    messages=[{"role": "user", "content": "Classify this support ticket: 'Can't log in'"}]
)

# Complex architecture review - high effort, thorough
response = client.messages.create(
    model="claude-sonnet-4-6-20260301",
    max_tokens=8192,
    thinking={"type": "enabled", "effort": "high"},
    messages=[{"role": "user", "content": "Review this microservice architecture..."}]
)

When to Use Each Level

Effort	Use Case	Cost Impact
`low`	Classification, formatting, simple Q&A	~70% fewer thinking tokens
`medium`	Most coding tasks, summaries, analysis	Balanced default
`high`	Complex debugging, architecture, math	Full thinking budget

Pro Tip

For Sonnet, medium effort is the recommended default for most use cases. Only bump to high for problems that genuinely require multi-step reasoning. This alone can cut your API costs without meaningful quality loss.