Fresh Claude Code sessions aren't hitting cache


Your fresh Claude Code sessions aren’t hitting cache. Here’s what it’s costing you.

Source: GitHub issue anthropics/claude-code#47098 [2], filed by user wadabum with traffic-capture evidence that fresh sessions were not hitting the cache. Chris Nighswonger (GitHub handle cnighswonger, author of the claude-code-cache-fix npm interceptor [5]) analyzed the request payload and identified the mechanism in a comment on the issue. Developer ZWhiteTrace independently instrumented 159 local sessions and posted a version-by-version hit-rate table in a follow-up comment on the same issue; the analysis script is a public gist [3].

Prompt caching only works if the cache prefix matches exactly, byte for byte, across turns [1]. Per cnighswonger’s analysis [2], Claude Code places your CLAUDE.md, installed skills, and tool schemas inside the messages[0] user-content blocks rather than the stable system[] prefix. That analysis also identifies whitespace changes, ordering shuffles, and dynamic snippets inside the block as the specific triggers: any one invalidates cache for everything after it in the request [2].

ZWhiteTrace’s instrumentation data [2]:

Claude Code version% of first turns in a fresh session with cache_read == 0
2.1.9490.9%
2.1.10040.6%
2.1.10428.6%

Sampling was uncontrolled per the source author; treat the rates as directional, not precise.

Rates are improving across versions; on 2.1.104, roughly 1 in 3 new sessions still pays full uncached price on turn 1 even when a warm cache exists from a recently-closed session. A cousin issue (#47107) [4] reports git status output at the top of the system prompt invalidating the cache front for the same reason. The prefix is invalidated from both ends.

What the bug costs you

Assume Sonnet 4.6, a 20,000-token shared prefix, and a 5-turn session. Pricing: $3/M base input, $3.75/M 5-minute cache write, $0.30/M cache read [1].

If caching worked the way it’s supposed to (write once, read on every subsequent turn):

EventTokensRateCost
Turn 1, cache write20,000$3.75/M$0.075
Turns 2–5, 4 cache reads80,000$0.30/M$0.024
Total, 5-turn session$0.099

With messages[0] jitter busting the prefix between turns, each follow-up turn either re-writes the cache or falls through to base input. Worst case (re-write every turn):

EventTokensRateCost
5 turns × 20k re-write100,000$3.75/M$0.375

The session pays $0.375 instead of ~$0.10, roughly 3.8× on a 5-turn session.

Cross-session cost

Most real-world Claude Code usage is short sessions, not long ones. Every claude <prompt> one-shot, every cron job, every /run-epic invocation, every fresh terminal. If the cache is supposed to warm up once and be reused across sessions within 5 minutes:

  • Ideal (1 write, 4 reads across 5 fresh sessions): $0.099
  • Bug (every new session re-writes): 5 × $0.075 = $0.375, same 3.8× multiplier

Check your own prefix size. Look at cache_creation_input_tokens on your first call. A bare install with default tools and a short CLAUDE.md runs ~10–15k. Power users with Anthropic’s skills, a few MCPs, and a richer CLAUDE.md sit at 20–30k. Heavy setups with symlinked rules across projects can hit 40–60k+. Every extra 1,000 tokens in the prefix multiplies through every session start.

Multiplied out (20k prefix, 10 fresh sessions/day):

  • Ideal (1 write, 9 reads): ~$0.13 / day in prefix costs
  • Bug (every session re-writes): ~$0.75 / day
  • Extra: +$0.62 / day, ~$18.60 / month per developer

At a 40k prefix: ~$37 / month per developer. At 60k: ~$56 / month. Small teams accumulate the cost across developers; cron-heavy or CI automation setups multiply it further because sessions start programmatically.

How to tell if it’s you

  • OpenTelemetry / Jaeger: turn-1 spans of fresh sessions show cache_read_input_tokens == 0 and cache_creation_input_tokens equal to (roughly) your full prefix.
  • Install claude-code-cache-fix as a measurement tool: npm install -g claude-code-cache-fix@latest (v1.11.0 as of 2026-04-15) then run with CACHE_FIX_DEBUG=1 [5]. It logs per-section prompt sizes to show which block is jittering.
  • Reproduce with ZWhiteTrace’s setup [3] and compare your hit rate to the table above.

Workaround

  1. Upgrade Claude Code. Each recent release has reduced the miss rate; update to the latest stable version.
  2. Reduce jitter in your messages[0] block. Audit your CLAUDE.md and skills for anything that changes between turns (timestamps, directory listings, todo snapshots). Move dynamic content into tool results or later message blocks where it won’t invalidate the prefix.
  3. Run claude-code-cache-fix for a measurement pass; don’t leave it on as a permanent layer.

What to watch for

The fix has to land inside Claude Code: move CLAUDE.md + skills + tool schemas into the stable system[] prefix, or lock down ordering and whitespace in messages[0]. Track #47098 [2] and #47107 [4] for movement. Combined with the Agent SDK subagent caching gap in #29966, both the main-session prefix and the subagent path are leaking until both issues land.

Fact Check
  1. Anthropic prompt caching docs: opt-in cache_control mechanism, 5-min vs 1-hour TTL, Sonnet 4.6 pricing ($3/M base input, $3.75/M 5-min write, $0.30/M cache read as of 2026-04-16), byte-exact prefix-matching requirement
  2. anthropics/claude-code#47098: new sessions never hit full cache; cnighswonger’s structural analysis that CLAUDE.md + skills + tools land in messages[0] user-content blocks where any jitter invalidates everything after
  3. ZWhiteTrace analysis script: Python script for measuring first-turn cache_read == 0 rates from local Claude Code session logs; the version-rate figures (90.9% on 2.1.94 → 28.6% on 2.1.104, n=159) appear in ZWhiteTrace’s comment on issue #47098
  4. anthropics/claude-code#47107: git status in system prompt invalidating cache from the front (companion issue to #47098)
  5. claude-code-cache-fix npm package: cnighswonger’s interceptor for measuring per-section prompt sizes; current version 1.11.0 as of 2026-04-15, run with CACHE_FIX_DEBUG=1