About Token Miser

Token Miser is a blog about one thing: getting more done with LLMs for less money. Not less capability — less spend. There's a difference, and finding it is the whole game.

The API bill doesn't care how impressive your prompts are. It charges by the token. So we write about the things that actually move the number: prompt compression, model selection, context caching, batching, provider routing, and every other trick that shaves cost without degrading results. We show our math, we date our benchmarks, and we resist the urge to declare anything "best" when pricing changes every quarter.

This is for the engineer debugging a runaway cost spike at 2am, the developer who discovered that their RAG pipeline is 80% redundant tokens, and the finance person who got tagged in a Slack thread about an API bill they had no idea existed. You don't need an intro to transformers. You need a number that's smaller than last month's number.

We publish when we have something worth saying, which usually happens right after a new model drops and everyone recalculates their budgets. Our first deep dive covers Claude Code agent architecture and how cache ordering can cut your per-agent costs by 60–87%. If you're running multi-agent pipelines, start there.