How do you actually make AI work for a dev team?

Run it as an operating model, not a tool. Point AI at the right tasks, keep change sets small behind real review, measure outcomes over output, serve trusted context, and raise codebase quality. The independent research shows these five levers, not the model choice, decide whether AI helps or hurts.

What is an AI operating model?

The set of practices around the tool that determine its results: which work it does, how that work gets reviewed and shipped, what context it runs on, and how you measure it. The model writes code; the operating model decides whether that code makes your team faster or slower.

Why isn't a better model enough to get value from AI?

Because the studies that found AI slowing teams down used frontier models. The bottleneck wasn't intelligence; it was oversized batches, weak review, missing context, and messy codebases. A smarter model amplifies whatever operating model it lands in, good or bad.

How does a team start fixing its AI operating model?

Measure first. Look at PR size, rework rate, and where AI is pointed. Then fix the cheapest broken lever, usually batch size or context. A scoped assessment maps which of the five levers is costing you most before you invest in the others.

How to make AI actually work for your engineering team

tsukumo

How to make AI actually work for your engineering team · tsukumo

Buying AI vs operating it

Criterion	Buying a tool	Operating a model
What you change	Procurement	How work flows
Where AI points	Wherever	The right tasks
Batch size	Whatever the agent emits	Capped, reviewable
Metric watched	Seats, percent AI code	Change-failure, rework
Context served	The whole repo	Current, scoped
Result	Felt faster	Measured faster

How to actually make AI work for your dev team

Lever 1: Point AI at the right work#

Lever 2: Keep batches small, and make review mean something#

Lever 3: Measure outcomes, not output#

Lever 4: Serve the agent trusted context#

Lever 5: Invest in codebase cleanliness#

What this looks like in practice#

What to do on Monday#

How we think about it#

How we run a 9-agent growth team on wrai.th (and what broke)

AI 'reasoning' has a cliff. Apple went and found the edge.

Your multi-agent system isn't failing on the model. Berkeley counted where.

Want this running on your team?