What's the difference between one agent and an agent fleet?

One agent doing a task is a script. A fleet is a coordination problem: several agents working in parallel on different parts of the codebase without colliding or duplicating work. The jump from one to many is mostly orchestration and observability, not a bigger model.

Why do agent fleets fail in production?

Usually one of the four layers is missing. Most often it's observability (running on faith instead of evidence) or the operating model (developers never learned to operate agents, so the tools get routed around). A missing context layer makes agents expensive and confidently wrong.

Can buying more Claude or Copilot seats get us to a production fleet?

Seats give your team model access. None of the four layers comes in the box. Crossing from copilot to a running fleet is an operating problem, not a license problem.

tsukumo

Running agent fleets in production: what it takes · tsukumo

tsukumo

Agency16 June 20264 min read

Running agent fleets in production: what it actually takes

Q: Do you need a bigger model or better prompts to run a fleet?

No. Those help at the margin. The fleet stands or falls on context, orchestration, observability, and an operating model your devs run. It's ordinary production engineering, which means it's buildable and yours to keep.

Going from one agent to a fleet in production isn't a prompt change. It's four engineering layers: context, orchestration, observability, and an operating model your devs run.

The short answer

Running a fleet of AI agents in production takes four engineering layers, not a better prompt: context the agents can trust, orchestration so they don't collide, observability so you operate on evidence, and an operating model your developers actually run. Skip any one and the fleet is expensive, unreliable, or quietly abandoned.

tsukumo

Short version: one agent doing a task is a script. A fleet doing real work in production is a system, and it needs four things the demo never shows: context the agents can trust, orchestration so they don't collide, observability so you can run them on evidence not faith, and an operating model your developers actually run. Skip any one and the fleet is expensive, unreliable, or quietly abandoned. Here's the honest version of what's involved.

1. Context the agents can trust#

An agent is only as good as what it knows about your codebase. At fleet scale, "let it read the repo" is both expensive (every agent, every session, re-deriving the same things) and wrong (it picks stale or duplicate docs). You need a context layer that serves the current, canonical answer cheaply and on demand. Get this right and agents are fast and correct; get it wrong and you're paying premium tokens for confident mistakes. We hit this ourselves and built trovex for it; the version we run does a doc lookup at about 60% fewer tokens than letting an agent re-read the repo. That's the whole game at fleet scale: the same lookup, many times a day, across many agents.

Running agent fleets in production: what it actually takes

1. Context the agents can trust#

2. Orchestration so the fleet doesn't collide#

3. Observability so you run on evidence#

4. An operating model your devs actually run#

The honest part: it's mostly engineering, not prompting#

How we do it#

Fast and production-grade: how an agentic studio ships both

What agentic product development actually is (and how it beats a dev shop)

When to scale your agent setup: the team signals that actually matter

Want this running on your team?