Claude Code v2.1.108: two cache regressions fixed, two warnings added, five issues unaddressed


Update 2026-04-16: Community reports on anthropics/claude-code#46917 indicate v2.1.108 also resolves a second regression: inflated cache_creation_input_tokens on versions 2.1.100–2.1.107, where the CLI rewrote cache content that did not need rewriting. Controlled datapoint from the thread: ~49,700 cache_creation tokens on v2.1.98, ~96,539 on v2.1.104 on the same test. Fix not yet confirmed by Anthropic; issue still open. If you benchmarked Claude Code on 2.1.100–2.1.107 against a pre-regression baseline and saw ~60–100% higher cache_creation on the same workload, upgrade to v2.1.108 or later and re-measure.

Claude Code v2.1.108 patches two prompt-caching regressions [1][2], adds warnings on two more issues, and leaves five structural cost problems unaddressed.

The telemetry-gated regression: with DISABLE_TELEMETRY set, the prompt cache TTL was downgrading from the 1-hour subscriber default to 5 minutes with no log or warning [1]. The 1h rollout is gated on a feature flag fetched via telemetry; with telemetry off, the gate never loaded and the downgrade happened. cache_creation_input_tokens ran higher on every session that idled past 5 minutes, because each gap forced a fresh write.

What the downgrade cost

Take a typical setup: Sonnet 4.6 [1], a 20,000-token shared prefix (Claude Code’s system prompt + a modest CLAUDE.md + a handful of tool schemas), and a workday where turns gap past 5 minutes often enough that each gap forces a fresh cache write.

Check your own prefix size. Look at cache_creation_input_tokens on your first main-session call after a fresh start: that number is your shared prefix. A bare Claude Code install with default tools and a short CLAUDE.md runs ~10–15k. A power user with skills, a few MCPs, and a richer CLAUDE.md sits at 20–30k. Setups with symlinked shared rules across projects, many MCPs, and heavy skills can hit 40–60k+.

Pre-upgrade (5m TTL)Post-upgrade (1h TTL)
Cache writes per hour~121
Write cost per hour (20k × $3.75/M)$0.90$0.075
Over an 8-hour workday$7.20$0.60

At a 40k prefix those numbers roughly double. At 60k, triple. Reads are unaffected; the 12× drop is on writes only.

The table models the TTL downgrade in isolation. Sessions running 2.1.100–2.1.107 also carried #46917’s cache_creation inflation [2] on top, so actual pre-upgrade bills on those versions ran higher than the 5m-TTL math alone suggests. The post-upgrade column still holds.

The universal 1h flag

ENABLE_PROMPT_CACHING_1H was Bedrock-only via ENABLE_PROMPT_CACHING_1H_BEDROCK (still honored, now deprecated). API key, Vertex, and Foundry users can set the universal flag directly:

# 1-hour TTL, now universal
export ENABLE_PROMPT_CACHING_1H=1

# Or force 5m for bursty work
export FORCE_PROMPT_CACHING_5M=1

1h cache writes cost 2× base versus 1.25× base for 5-minute [1]. 1h is cheaper than 5m only when the idle pattern forces repeated 5m re-writes: long sessions, gaps between turns, stable prefix. For bursty work that stays warm inside a 5m window, FORCE_PROMPT_CACHING_5M is the cheaper setting.

One thing this flag does not touch: subagent prefixes are deliberately capped at 5-minute TTL by Anthropic regardless of the setting. If your cost worry is subagent dispatches, this knob isn’t the lever. The structural fixes in the next section are.

Two new warnings

/model now prompts before a mid-conversation model switch; the next response re-reads the full conversation history uncached. Startup warns when DISABLE_PROMPT_CACHING* is set. In both cases the behavior is unchanged; only the warning is new.

How to tell if 2.1.108 affected you

Three signals:

  • You had DISABLE_TELEMETRY set across long-session work. Your cache_creation_input_tokens on those sessions was inflated. Compare a pre-upgrade week to a post-upgrade week on a stable-prefix workload: there should be a step-function drop.
  • You’re on Vertex, Foundry, or API key and the 1h cache was unavailable before. It is now via ENABLE_PROMPT_CACHING_1H=1.
  • You see a /model switch warning post-upgrade. The warning is new; the behavior behind it was unchanged pre-upgrade. The re-read cost was landing without a prompt.

Five issues unaddressed

v2.1.108 doesn’t touch any of these:

  • Subagent breakpoint bug (anthropics/claude-code#29966). The Agent SDK defaults enablePromptCaching to false on subagent spawns, which leaves cache_control markers off the request. Anthropic disputes the root cause in the issue thread; the reporter’s proxy logs show the markers missing regardless. Either way, cacheable prefix content gets re-read at base input rate on every dispatch. Still open.
  • MCP tool prefix churn. Subagent spawns ship thousands of tokens of Gmail, Slack, or other MCP tool definitions the subagent never calls. MCP server reconnection invalidates the cache from position 1. Fix lives in each agent’s tools: frontmatter. See The 7,000-token tax on every spawn.
  • .claude/rules/ invisible to subagents. Shared rules placed where subagents can’t read them don’t cache. See Skills vs rules vs reads.
  • Content placement in agent bodies. Responsible for the 63–87% agent token reductions documented in Agent token cuts, 63–87%. Unchanged.
  • Dynamic content in system prompts. Hook-injected timestamps and similar dynamic values invalidate the prefix on every turn. Unmentioned in the release notes.

What to watch for

v2.1.108 addresses delivery-level bugs. The remaining cost drivers are structural: subagent breakpoints, tool prefix stability, rules placement, dynamic content. Track anthropics/claude-code#29966 for the subagent breakpoint issue.

Fact Check
  1. Claude Code v2.1.108 release notes: DISABLE_TELEMETRY fix, ENABLE_PROMPT_CACHING_1H going universal, /model and DISABLE_PROMPT_CACHING* startup warnings, Sonnet 4.6 cache pricing ($3/M base input, $3.75/M 5-min write, $0.30/M cache read as of 2026-04-15)
  2. anthropics/claude-code#46917: cache_creation inflation on v2.1.100–2.1.107. Fix appears to have landed in v2.1.108 per community reports (user chrisvaillancourt, 2026-04-15). Not yet formally confirmed by Anthropic. Issue still open as of 2026-04-16.