API Tips advanced

Use Adaptive Thinking for Complex API Tasks

Set thinking to adaptive mode so Claude automatically decides when deep reasoning is needed, balancing speed and accuracy per request.

March 15, 2026

When using the Claude API, you can enable adaptive thinking mode, which lets Claude automatically decide how much reasoning effort to apply to each request. Simple questions get fast answers; complex problems get deep analysis.

Implementation:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6-20250415",
    max_tokens=8096,
    thinking={
        "type": "adaptive",
        "budget_tokens": 5000  # max tokens Claude can use for thinking
    },
    messages=[
        {"role": "user", "content": "Analyze this algorithm's time complexity..."}
    ]
)

# Access thinking and response separately
for block in response.content:
    if block.type == "thinking":
        print("Reasoning:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

When to use adaptive vs. always-on thinking:

Mode	Best for
`"type": "adaptive"`	Mixed workloads where some queries are simple and some are complex
`"type": "enabled"`	Workloads where every request needs deep reasoning (math, code analysis)

Budget tokens tip: The budget_tokens parameter sets the maximum thinking tokens, not a guaranteed usage. Claude will use fewer tokens when the problem is straightforward. Set it high enough for your hardest cases. You only pay for what Claude actually uses.

Trade-off: Extended thinking is much more accurate on complex reasoning tasks (math competitions, multi-step code analysis) but adds latency. For latency-sensitive applications, use adaptive mode so you only pay the latency cost when it matters.