Every token in your system prompt is sent with every API call. A 2,000-token system prompt across 50 turns costs you 100K input tokens just in repeated instructions.
Common Bloat
Most system prompts have these problems:
- Repetition: saying the same rule three different ways
- Examples that could be references: embedding full documents instead of pointing to them
- Defensive instructions: “don’t do X, also don’t do Y, also avoid Z” when one rule covers all three
- Boilerplate: paragraphs of context the model doesn’t need
Before and After
Before (180 tokens):
You are a helpful coding assistant. You should always write clean,
well-documented code. Make sure to add comments to explain complex
logic. Always follow best practices. When writing code, ensure it
is readable and maintainable. Add docstrings to all functions.
After (25 tokens):
Write clean, documented code. Add docstrings. Comment non-obvious logic.
Techniques
- Merge overlapping rules. If three rules all say “be concise,” keep one
- Use shorthand. Claude understands “No yapping” as well as a paragraph about brevity
- Move examples to few-shot messages instead of embedding in the system prompt
- Reference, don’t embed. Say “Follow the style in src/utils/” instead of pasting the whole file
- Delete “be helpful” instructions. Claude already tries to be helpful
Measure It
import anthropic
client = anthropic.Anthropic()
count = client.count_tokens(your_system_prompt)
print(f"System prompt: {count} tokens")
Aim for under 500 tokens for most use cases. If you’re over 1,000, you almost certainly have bloat.