Workflows advanced

Generate Incident Response Runbooks with Claude

Have Claude analyze your infrastructure and codebase to produce runbooks for common failure modes, complete with diagnostic commands.

March 16, 2026

When things break at 3am, nobody wants to think from first principles. Generate runbooks in advance.

Generate from Your Codebase

Analyze our infrastructure code in terraform/ and our services in src/.
For each service, generate a runbook covering:
1. Health check endpoints and how to verify they work
2. Common failure modes (DB connection, memory, disk)
3. Diagnostic commands to run for each failure
4. Recovery steps
5. Escalation criteria

Output as markdown files in docs/runbooks/.

Template Structure

Claude generates runbooks like this:

# API Service Runbook

## Health Check
curl https://api.example.com/health

## Symptoms: 5xx Spike
1. Check pod status: `kubectl get pods -l app=api`
2. Check recent deploys: `kubectl rollout history deployment/api`
3. Check DB connections: `kubectl exec -it api-pod -- pg_isready`
4. If DB unreachable: check `terraform/rds.tf` for config

## Recovery: Rollback
kubectl rollout undo deployment/api

## Escalation
If not resolved in 15min, page the on-call SRE.

Keep Them Updated

Compare docs/runbooks/ against the current infrastructure code.
Flag any runbooks that reference services, commands, or configs
that no longer exist. Update them.

Tip

Store runbooks in the repo alongside the code they describe. When the code changes, Claude can update the runbook in the same PR.