System Prompts intermediate

Trim Your System Prompt to Save Tokens

A bloated system prompt wastes tokens on every message. Cut yours in half by removing redundancy, using references, and being concise.

March 16, 2026

Every token in your system prompt is sent with every API call. A 2,000-token system prompt across 50 turns costs you 100K input tokens just in repeated instructions.

Common Bloat

Most system prompts have these problems:

Repetition: saying the same rule three different ways
Examples that could be references: embedding full documents instead of pointing to them
Defensive instructions: “don’t do X, also don’t do Y, also avoid Z” when one rule covers all three
Boilerplate: paragraphs of context the model doesn’t need

Before and After

Before (180 tokens):

You are a helpful coding assistant. You should always write clean,
well-documented code. Make sure to add comments to explain complex
logic. Always follow best practices. When writing code, ensure it
is readable and maintainable. Add docstrings to all functions.

After (25 tokens):

Write clean, documented code. Add docstrings. Comment non-obvious logic.

Techniques

Merge overlapping rules. If three rules all say “be concise,” keep one
Use shorthand. Claude understands “No yapping” as well as a paragraph about brevity
Move examples to few-shot messages instead of embedding in the system prompt
Reference, don’t embed. Say “Follow the style in src/utils/” instead of pasting the whole file
Delete “be helpful” instructions. Claude already tries to be helpful

Measure It

import anthropic
client = anthropic.Anthropic()
count = client.count_tokens(your_system_prompt)
print(f"System prompt: {count} tokens")

Aim for under 500 tokens for most use cases. If you’re over 1,000, you almost certainly have bloat.