Claude Tips mascot
Claude Tips & Tricks
API Tips intermediate

Use the Effort Parameter to Control Cost and Speed

The effort parameter lets you dial Claude's thinking depth up or down per request, trading thoroughness for speed and token savings.

The effort parameter controls how much thinking Claude does per request. Choose low, medium, or high to balance cost vs. thoroughness.

API Usage

import anthropic
client = anthropic.Anthropic()

# Quick classification - low effort, fast and cheap
response = client.messages.create(
    model="claude-sonnet-4-6-20260301",
    max_tokens=1024,
    thinking={"type": "enabled", "effort": "low"},
    messages=[{"role": "user", "content": "Classify this support ticket: 'Can't log in'"}]
)

# Complex architecture review - high effort, thorough
response = client.messages.create(
    model="claude-sonnet-4-6-20260301",
    max_tokens=8192,
    thinking={"type": "enabled", "effort": "high"},
    messages=[{"role": "user", "content": "Review this microservice architecture..."}]
)

When to Use Each Level

EffortUse CaseCost Impact
lowClassification, formatting, simple Q&A~70% fewer thinking tokens
mediumMost coding tasks, summaries, analysisBalanced default
highComplex debugging, architecture, mathFull thinking budget

Pro Tip

For Sonnet, medium effort is the recommended default for most use cases. Only bump to high for problems that genuinely require multi-step reasoning. This alone can cut your API costs without meaningful quality loss.