The token counting endpoint tells you exactly how many tokens a message will use before you spend money sending it. The endpoint itself is free.
API Usage
import anthropic
client = anthropic.Anthropic()
token_count = client.messages.count_tokens(
model="claude-sonnet-4-6-20260301",
messages=[{
"role": "user",
"content": "Your potentially long prompt here..."
}],
system="Your system prompt"
)
print(f"Input tokens: {token_count.input_tokens}")
# Budget enforcement
estimated_cost = token_count.input_tokens * 0.000003 # Sonnet input pricing
if estimated_cost > 0.10:
print(f"Warning: ~${estimated_cost:.4f} in input tokens alone")
Use Cases
- Dynamic prompts - when user content varies wildly in length
- Budget enforcement - reject requests exceeding a cost threshold
- Model routing - send cheap prompts to Opus, expensive ones to Sonnet
- Monitoring - track token patterns before they become costs
Combined with Prompt Caching
Count tokens to decide if caching is worthwhile. Prompts under 1,024 tokens don’t benefit from caching (minimum cacheable length). Prompts over 10K tokens that repeat are prime candidates.