Journal

AI in production.

Field notes from running agent fleets in production — what survives past the demo, and how dev teams become agentic operators.

Latest

When to scale your agent setup: the team signals that actually matter

Most teams scale their AI coding agents right after the demo works, which is the wrong moment. The signal to scale isn't enthusiasm. It's that the constraints keeping solo agent use safe have started to break across the team.

19 Jun 2026Agency

The archive · 35

What an AI engineering assessment actually is (and what you walk away with)
Most 'AI assessments' are a slide deck or a readiness quiz with a sales call attached. A real one happens in your repo, in production, and tells you what AI won't fix. Here's what it is and what you keep.
19 Jun 2026Agency
Evaluating AI agents in production: getting past vibes
Most teams ship agent changes on vibes: it felt better in the demo. But an agent silently regresses, and "it seems good" isn't a measurement. Evals are a golden set of real tasks with gradeable outcomes, run on every change, with the judge itself checked.
17 Jun 2026Agency
Securing AI coding agents: prompt injection is the new RCE
The moment you give an AI coding agent tools, prompt injection stops being a content problem and becomes remote code execution. The agent reads a poisoned repo or issue, and the injected instruction runs with the agent's permissions. You don't prompt your way out of this. You treat the agent as an untrusted client.
17 Jun 2026Agency
Replacing a legacy document system with doc-intelligence is a pipeline, not a chatbot
"AI reads your documents" dies on the boring parts. Replacing a legacy DMS isn't a chatbot over your files. It's a split-classify-extract pipeline that has to beat the humans it replaces across an 89,000-document backlog and 142 real categories.
17 Jun 2026Agency
Your open-weight model's license is probably lying to you
The license tag on an open-weight model often isn't the license you're actually bound by. Licenses inherit through fine-tunes, and the tag inherits wrong all the time. Ship on the tag and you can be running a restricted model in production without knowing it.
17 Jun 2026Agency
How to win over developers who are skeptical of AI (a lead's playbook)
Your best senior tried the AI, got code that was almost right and took longer to fix than write, and quietly stopped. That's not resistance, it's judgment. Here's how a lead earns real buy-in instead of fighting it.
17 Jun 2026Agency
Orchestrating AI coding-agent fleets: making many agents act like a team
One agent needs no coordination. Five do. Run them in parallel without it and you get collisions, duplicated work, and a handoff loop where nobody owns the task. Orchestration is the discipline that turns a pile of agents into a team.
17 Jun 2026Agency
Is AI-written code safe to ship? Yes, if you review it for what AI gets wrong
The PR looks clean, passes the tests, and imports a library that doesn't exist. AI-written code fails differently than human code, and shipping it safely is a review problem, not a model problem.
17 Jun 2026Agency
AI agent observability: knowing what your agents did, and why
Your agents ran overnight. This morning there are merged changes, a token bill, and something that looks off. Can you reconstruct what they did and why? That question is what agent observability answers, and your existing dashboards don't.
17 Jun 2026Agency
Getting AI into production at a scale-up: the in-between problem
Thirty engineers, a codebase that works, everyone now using AI, and delivery somehow isn't faster. Scale-ups hit a bind the startup and enterprise playbooks don't fix. Here's the one that fits.
17 Jun 2026Agency
Custom JWT claims with Supabase auth hooks (and the two traps)
If you check a user's permissions on every API call, you're doing auth at the wrong layer. Inject the claim into the JWT at login with a Supabase auth hook, so the token carries it. Here's the pattern, plus the two traps that make it fail silently.
17 Jun 2026Agency
RAG in production over regulations and client documents
Production RAG over regulations isn't embed-and-retrieve. It's a layered pipeline (hybrid search, reranking, hierarchical summaries, graph context), and the failure that bites hardest is a silent embedding-dimension mismatch that returns confident garbage.
17 Jun 2026Agency
Building a senior-colleague AI: versioned skills and gated tools
A loose-cannon agent is dangerous and a shackled one is useless. The way out is to put the judgment in versioned, fail-closed skill definitions and to gate which tools the agent can touch per skill and per turn. Capable without being a liability.
17 Jun 2026Agency
The infra failures nobody warns you about in a dockerized AI stack
The thing that takes down an AI system in production usually isn't the model or the app. It's the boring infrastructure underneath, and it fails green. Four real ones from a dockerized AI stack: a mount writing to the void, a runaway container, a tripped autoscaler, dead proxy routes.
17 Jun 2026Agency
A five-layer memory for an AI agent that works a client for nine months
A context window is not memory. For an agent that handles one client's accounting across nine months, we built memory as five distinct layers (facts, history, decisions) with a promotion path that turns a one-off ruling into a standing rule.
17 Jun 2026Agency
We exposed our whole back office as MCP: 13 servers, 222 tools
Making a real ERP usable by AI agents isn't an integration project. It's a contract. We exposed a fiduciary back office as 13 MCP servers and 222 tools, one server per domain, one uniform envelope, per-principal auth. Here's the shape that holds.
17 Jun 2026Agency
Compliance as an architecture constraint: AI for a Swiss fiduciary
In a regulated business, compliance isn't a layer you add after the AI works. It's the constraint that decides what the architecture is allowed to be. Here's how Swiss data and professional-secrecy law shaped every layer of an AI system we built for a fiduciary.
17 Jun 2026Agency
How we ship our own product with a fleet of AI agents
Most 'we use AI' stories are an autocomplete in someone's editor. Ours is an org chart: a CTO agent, domain leads, a coordination layer, tickets claimed off a board, one isolated worktree per agent, and a review gate nothing skips. Here's how it actually runs.
17 Jun 2026Agency
GPT isn't enough: we wrap deterministic state machines around the LLM
The reliable parts of a production AI agent aren't in the model. They're in the deterministic code wrapped around it. Stop trying to prompt your way to correctness and start constraining what the model is allowed to do.
17 Jun 2026Agency
Why your AI coding agents cost so much (and how to cut it)
If your agent token bill keeps climbing, the model price usually isn't the problem. The waste is in how much context you pay for on every call. The real cost drivers, and the levers that actually move them.
17 Jun 2026Agency
Measuring AI's impact in production, honestly (no vanity metrics)
Lines of AI-written code and acceptance rates measure activity, not impact. The honest question is whether your team ships more of the right work at the same quality. How to read that, and the one number that's actually real.
17 Jun 2026Agency
Governing AI agents in production: control, accountability, and audit
Making an agent reliable is one problem. Governing a non-human actor with commit access is another: who owns its actions, how far a bad one reaches, and whether you can prove what happened. The governance layer, plainly.
17 Jun 2026Agency
Five silent failures in a production invoice pipeline
The bugs that hurt a production pipeline don't crash. They return 200, paint the dashboard green, and quietly stop doing their job. Here are five we hit running an agentic accounting platform, and how we caught them.
17 Jun 2026Agency
What AI readiness actually means for a dev team
AI readiness isn't a license count. It's whether your team can run agents on real work in production. Six dimensions tell you where you actually stand, and which gap is stalling you.
16 Jun 2026Agency
How to evaluate an AI consulting partner: a CTO's checklist
Most AI consultancies sell decks or dependency. A few transfer real capability onto your team. Six questions that tell the difference before you sign.
16 Jun 2026Agency
Managing context for AI coding agents (why they lose the thread, and the fix)
Agents lose context because a big repo doesn't fit a window, and a bigger window doesn't fix it. The fix is serving the canonical answer on demand.
16 Jun 2026Agency
How to make AI coding agents reliable in production
Reliable agents aren't a better model, they're the engineering around it: scoped permissions, review gates, observability, and context the agent can trust.
16 Jun 2026Agency
AI that's 10x, not cheaper: what prod-grade agentic output means
If the goal of AI is to ship the same work cheaper, you'll be disappointed. The win is prod-grade output and roughly 10x from the team you already trust.
16 Jun 2026Agency
What agentic operators actually do (the operator, not the copilot)
An agentic operator runs AI agents that do whole units of work instead of typing every line. We operate this way every day — here's the job, concretely, and the one skill that's actually new.
16 Jun 2026Agency
Augment, never replace: turning a dev team into agentic operators
The fear that AI is there to replace developers is what quietly caps the capability you paid for. Augment-not-replace isn't ethics, it's what works.
16 Jun 2026Agency
What agentic-dev training actually looks like
Most AI training is a slide deck and a prompt cheatsheet. Turning a dev team into agentic operators is hands-on, on your own codebase, on real production work.
16 Jun 2026Agency
Build vs buy your AI capability: the CTO's real decision
Buying an AI tool gives access, not capability. Building alone burns senior quarters. For most teams the real answer is neither. The honest build-vs-buy framing.
16 Jun 2026Agency
Why AI demos die before production
The AI demo always works. Then it meets your real codebase, standards, and scale, and quietly dies. The demo-to-production gap is where most AI initiatives fail.
16 Jun 2026Agency
The copilot-operator gap: why your Claude seats aren't enough
Your team has AI autocomplete, maybe 10% of what coding agents can do. The gap to agents running work in production is an operating problem, not a license problem.
16 Jun 2026Agency
Running agent fleets in production: what it actually takes
Going from one agent to a fleet in production isn't a prompt change. It's four engineering layers: context, orchestration, observability, and an operating model your devs run.
16 Jun 2026Agency

Get the notes

We write when we've shipped or learned something about running AI agents in production. No cadence quota, no filler.

AI in production.

When to scale your agent setup: the team signals that actually matter

What an AI engineering assessment actually is (and what you walk away with)

Evaluating AI agents in production: getting past vibes

Securing AI coding agents: prompt injection is the new RCE

Replacing a legacy document system with doc-intelligence is a pipeline, not a chatbot

Your open-weight model's license is probably lying to you

How to win over developers who are skeptical of AI (a lead's playbook)

Orchestrating AI coding-agent fleets: making many agents act like a team

Is AI-written code safe to ship? Yes, if you review it for what AI gets wrong

AI agent observability: knowing what your agents did, and why

Getting AI into production at a scale-up: the in-between problem

Custom JWT claims with Supabase auth hooks (and the two traps)

RAG in production over regulations and client documents

Building a senior-colleague AI: versioned skills and gated tools

The infra failures nobody warns you about in a dockerized AI stack

A five-layer memory for an AI agent that works a client for nine months

We exposed our whole back office as MCP: 13 servers, 222 tools

Compliance as an architecture constraint: AI for a Swiss fiduciary

How we ship our own product with a fleet of AI agents

GPT isn't enough: we wrap deterministic state machines around the LLM

Why your AI coding agents cost so much (and how to cut it)

Measuring AI's impact in production, honestly (no vanity metrics)

Governing AI agents in production: control, accountability, and audit

Five silent failures in a production invoice pipeline

What AI readiness actually means for a dev team

How to evaluate an AI consulting partner: a CTO's checklist

Managing context for AI coding agents (why they lose the thread, and the fix)

How to make AI coding agents reliable in production

AI that's 10x, not cheaper: what prod-grade agentic output means

What agentic operators actually do (the operator, not the copilot)

Augment, never replace: turning a dev team into agentic operators

What agentic-dev training actually looks like

Build vs buy your AI capability: the CTO's real decision

Why AI demos die before production

The copilot-operator gap: why your Claude seats aren't enough

Running agent fleets in production: what it actually takes

Get the notes

AI in production.

When to scale your agent setup: the team signals that actually matter

What an AI engineering assessment actually is (and what you walk away with)

Evaluating AI agents in production: getting past vibes

Securing AI coding agents: prompt injection is the new RCE

Replacing a legacy document system with doc-intelligence is a pipeline, not a chatbot

Your open-weight model's license is probably lying to you

How to win over developers who are skeptical of AI (a lead's playbook)

Orchestrating AI coding-agent fleets: making many agents act like a team

Is AI-written code safe to ship? Yes, if you review it for what AI gets wrong

AI agent observability: knowing what your agents did, and why

Getting AI into production at a scale-up: the in-between problem

Custom JWT claims with Supabase auth hooks (and the two traps)

RAG in production over regulations and client documents

Building a senior-colleague AI: versioned skills and gated tools

The infra failures nobody warns you about in a dockerized AI stack

A five-layer memory for an AI agent that works a client for nine months

We exposed our whole back office as MCP: 13 servers, 222 tools

Compliance as an architecture constraint: AI for a Swiss fiduciary

How we ship our own product with a fleet of AI agents

GPT isn't enough: we wrap deterministic state machines around the LLM

Why your AI coding agents cost so much (and how to cut it)

Measuring AI's impact in production, honestly (no vanity metrics)

Governing AI agents in production: control, accountability, and audit

Five silent failures in a production invoice pipeline

What AI readiness actually means for a dev team

How to evaluate an AI consulting partner: a CTO's checklist

Managing context for AI coding agents (why they lose the thread, and the fix)

How to make AI coding agents reliable in production

AI that's 10x, not cheaper: what prod-grade agentic output means

What agentic operators actually do (the operator, not the copilot)

Augment, never replace: turning a dev team into agentic operators

What agentic-dev training actually looks like

Build vs buy your AI capability: the CTO's real decision

Why AI demos die before production

The copilot-operator gap: why your Claude seats aren't enough

Running agent fleets in production: what it actually takes

Get the notes