Agency17 June 20263 min read

Why your AI coding agents cost so much (and how to cut it)

If your agent token bill keeps climbing, the model price usually isn't the problem. The waste is in how much context you pay for on every call. The real cost drivers, and the levers that actually move them.

The short answer

Mostly context, not the model price. The biggest driver is agents rereading the codebase every session instead of being served the current canonical answer; serving it instead cuts the tokens per lookup by roughly 60% in our own tooling. Bloated context and rework from stale context multiply the bill across a fleet.

tsukumo

Why your AI coding agents cost so much (and how to cut it)

Short version: when an agent token bill climbs, the model's per-token price is rarely the real problem. The cost is in context, how much an agent has to read, and reread, to do anything. The biggest driver is agents rereading large parts of the codebase every session instead of being served the current canonical answer. Then bloated context on every call, and rework when an agent acts on stale context and has to redo it. Run a fleet and each of those multiplies. The levers that actually cut the bill are about context and rework, not switching to a cheaper model.

The model price is the small number#

It's natural to look at the per-token rate and shop for a cheaper model. That's usually optimizing the small number. What you actually pay for is volume: how many tokens move through the agent to get a unit of work done. A cheaper model that needs more retries, or that you still feed the same bulky context, can cost more, not less. Start with the volume, not the rate.

Driver 1: rereading the codebase#

This is the big one. An agent with no durable source of truth rediscovers your codebase from scratch, often every session, by reading files into the context window to answer questions it answered yesterday. You pay for that reading every time. Serving the agent the current canonical answer instead, on demand, is the single largest lever. In our own tooling, that's roughly a 60% cut in the tokens per lookup, and we wrote up the mechanism in managing context for AI coding agents.

Driver 2: bloated context on every call#

The opposite failure of too little context is too much. Stuffing whole directories or long histories into every call, just in case, is expensive and often makes the agent worse, not better, because the signal gets buried. Right-sizing context, giving each call what its task needs and not the entire repo, cuts cost and usually improves the output at the same time.

Driver 3: rework from bad context#

When an agent acts on stale or wrong context, it produces a confident mistake, and you pay twice: once for the wrong work, again for the redo, plus the human time to catch it. Cost and reliability are the same problem here. Fixing the context that makes agents confidently wrong removes the retries, which is why reliability work shows up directly on the bill.

Driver 4: the fleet multiplies everything#

One wasteful agent is a rounding error. A fleet of them, each rereading the repo, each carrying bloated context, each occasionally reworking, multiplies every inefficiency above. This is why cost discipline matters more, not less, as you scale from one agent to many: the savings compound the same way the waste does.

You can't cut what you can't see#

Most teams can't say what their agents spend, or on what. Without that, cost is a surprise line on the bill and you're guessing at fixes. Cost per agent and per unit of work is a measurable number; once you can see it, the drivers above stop being abstract and you can attack the biggest one first. This is part of measuring impact honestly, which we covered in measuring AI's impact in production.

The order that actually works#

If the bill is too high, work the levers in this order:

Give agents durable canonical context so they stop rereading the repo. Biggest lever.
Right-size context per call so you're not paying to bury the signal.
Cut rework by fixing the context that makes agents wrong.
Measure cost per unit so you attack the real driver, not a guess.
Then, if needed, consider a cheaper model. Last, not first.

The headline: agents are expensive when they reread and rework, not because the model costs too much. Fix the context and the bill follows.

How we help#

Cutting agent cost is part of getting agents to run well in production, and it's something we do on our own products before we do it for anyone. We bring the context and observability layers we run ourselves, fit them to your codebase, and train your team to keep the bill honest as you scale. If your agent costs are climbing and you can't see why, that's a solvable problem, and the conversation to have. Talk to us about your team.

Common questions

Why are my AI coding agents so expensive?

Usually context, not the model. Agents that reread large parts of the repo every session, carry bloated context into each call, and redo work after acting on stale context burn tokens fast. The model price is a small part; the waste is in how much context you pay for per call, multiplied across a fleet.

How do I reduce AI agent token costs?

Serve agents the current canonical answer instead of letting them reread the codebase, give each call only the context its task needs, and cut rework by fixing the context that makes agents confidently wrong. Measure cost per lookup so you can see what's working. A cheaper model is the last lever, not the first.

Does using a cheaper model lower agent costs?

A little, and often at the cost of more retries, which can erase the saving. The larger waste is paying to feed the same bulky context into every call. Fix that first; it cuts cost without trading away reliability.

How much can better context cut agent costs?

A lot, because context is the biggest line. As one concrete data point, serving agents the canonical answer instead of rereading the repo cuts the tokens per lookup by roughly 60% in our own tooling. Your number depends on your codebase, but context is where the savings live.

Want this running on your team?

Get your assessment