Agency17 June 20263 min read
Why your AI coding agents cost so much (and how to cut it)
If your agent token bill keeps climbing, the model price usually isn't the problem. The waste is in how much context you pay for on every call. The real cost drivers, and the levers that actually move them.
The short answer
Mostly context, not the model price. The biggest driver is agents rereading the codebase every session instead of being served the current canonical answer; serving it instead cuts the tokens per lookup by roughly 60% in our own tooling. Bloated context and rework from stale context multiply the bill across a fleet.

Short version: when an agent token bill climbs, the model's per-token price is rarely the real problem. The cost is in context, how much an agent has to read, and reread, to do anything. The biggest driver is agents rereading large parts of the codebase every session instead of being served the current canonical answer. Then bloated context on every call, and rework when an agent acts on stale context and has to redo it. Run a fleet and each of those multiplies. The levers that actually cut the bill are about context and rework, not switching to a cheaper model.