When using the Claude API, you can enable adaptive thinking mode, which lets Claude automatically decide how much reasoning effort to apply to each request. Simple questions get fast answers; complex problems get deep analysis.
Implementation:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6-20250415",
max_tokens=8096,
thinking={
"type": "adaptive",
"budget_tokens": 5000 # max tokens Claude can use for thinking
},
messages=[
{"role": "user", "content": "Analyze this algorithm's time complexity..."}
]
)
# Access thinking and response separately
for block in response.content:
if block.type == "thinking":
print("Reasoning:", block.thinking)
elif block.type == "text":
print("Answer:", block.text)
When to use adaptive vs. always-on thinking:
| Mode | Best for |
|---|---|
"type": "adaptive" | Mixed workloads where some queries are simple and some are complex |
"type": "enabled" | Workloads where every request needs deep reasoning (math, code analysis) |
Budget tokens tip: The budget_tokens parameter sets the maximum thinking tokens, not a guaranteed usage. Claude will use fewer tokens when the problem is straightforward. Set it high enough for your hardest cases. You only pay for what Claude actually uses.
Trade-off: Extended thinking is much more accurate on complex reasoning tasks (math competitions, multi-step code analysis) but adds latency. For latency-sensitive applications, use adaptive mode so you only pay the latency cost when it matters.