Stop counting tokens · David Walsh

Every disruptive technology goes through the same two phases. First you figure out if it's useful. Then, much later, you figure out how to make it cheap. Teams keep trying to skip the first phase.

CPU cycles, 2003

My first real dev job was at Bedbookers, a UK travel startup, writing PHP with a bit of Java and .NET on the side. Shared hosting, MySQL 4, APC if you were lucky. I spent entire afternoons inside request-handlers and hot loops, counting things I had no business counting. I knew which PHP string functions allocated. I knew when to reach for StringBuilder over concatenation in Java. I could tell you how many queries the booking page fired, off the top of my head.

Here's the sort of thing I'd do:

// Before: a string allocation every iteration
String rows = "";
for (Booking b : bookings) {
  rows += renderRow(b);
}

// After: one buffer, one allocation at the end
StringBuilder rows = new StringBuilder(bookings.size() * 128);
for (Booking b : bookings) {
  rows.append(renderRow(b));
}

I was proud of this. I would write it, measure it, and post the millisecond difference in the team channel. Sometimes I'd save 40 ms on a page that spent 800 ms in the database. Sometimes I'd save nothing, because the JIT had already figured it out and I'd just made the code harder to read. Either way, I'd made the code harder to read.

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. — Donald Knuth

The quote is older than I am. Everyone who's ever written a loop has heard it. And yet there we were, a team of twenty engineers, running benchmark scripts against loops that executed once per user click — on features that no customer had asked for.

A year later Moore's Law handed us faster boxes, MySQL got a query cache, and the whole exercise was rounding error. The product shipped or it didn't, and nobody remembered the loop.

The token obsession

I see the same reflex today, pointed at a different meter. A team builds an internal tool that uses an LLM to triage support tickets. It works. Agents love it. Response times drop. And then the first monthly bill arrives and somebody panics.

So the team spends an afternoon trying to shave 200 tokens off a 10,000-token prompt. They write a custom summariser. They split the call into three smaller calls to avoid context. They add a cache. They argue about whether Haiku is good enough.

Meanwhile the support agent whose ticket is sitting in the queue is being paid £15 an hour to wait.

This is cycle-counting in a new costume. The bill is visible, so it feels urgent. The labour cost it's displacing is notvisible on the same dashboard, so it doesn't feel like anything at all. The asymmetry is the trap.

ROI before efficiency

On any disruptive technology there is an adoption phase and an optimisation phase, and they are not the same phase. Adoption is figuring out whether the thing works at all, whether anyone wants it, whether it changes the shape of the business. Optimisation is squeezing the unit economics once it does.

The order matters. Optimising the unit economics of a thing that nobody uses is a very expensive way to produce nothing. Worse, the optimisation work typically makes the thing harder to change — which is exactly what you need to be doing in the adoption phase.

Cost-saving is not the first question on a disruptive technology. It is maybe the third or fourth. The first question is whether the product is good enough that anyone will pay for it, internally or externally. The second is whether the ROI, measured honestly, is positive. Token spend is a line on that calculation — not the calculation.

Try it yourself

Here's a calculator. Plug in your own numbers — a task your team does repeatedly, the time it takes a human, what you pay that human, and what the equivalent LLM call costs. The result is per-task cost, weekly and annual savings, and a rough break-even multiple.

ROI calculator

Tokens vs. labour, per task

Tasks per week

Minutes per task (human)

Hourly rate (GBP)

Tokens per task (blended)

Model preset

Price per 1M tokens (GBP)

£51,850.24 saved per year

Human cost / task: £20.00
AI cost / task: £0.06
Weekly saving: £997
Break-even multiple: 347.2×

Break-even multiple = human cost ÷ AI cost per task. A 50× multiple means the human alternative costs fifty times what the tokens do.

For most knowledge-work tasks the break-even multiple is not close. It's an order of magnitude, sometimes two. The first time I ran this calculation on a real workflow I assumed I'd made an arithmetic error.

Cheaper than humans

The rule of thumb is this: if the tool is doing something genuinely useful, the tokens are cheaper than the human alternative. Not a bit cheaper. Often a hundred times cheaper.

A knowledge worker in the UK costs somewhere between £30 and £120 an hour once you include salary, tax, benefits, and overhead. That's 50p to £2 a minute. A 10,000-token Sonnet call costs about 7p. If that call replaces three minutes of that person's time, the maths is over before it starts.

The corollary, which is the one most teams miss: if the tokens are notcheaper than the human alternative, you probably haven't found a useful application yet. That's a product problem, not a cost problem, and no amount of prompt-trimming will fix it.

When efficiency matters

There is a right time to care about token cost. Three situations, in my experience:

High-volume background jobs. When the LLM is running unattended on millions of records a day, the bill is no longer rounding error and efficiency compounds.
Thin-margin SaaS passthrough.If you're reselling model access inside a product priced at £20/month, your gross margin lives or dies on the per-call cost.
Latency budgets.Sometimes you're trimming tokens not to save money but to keep a call under 2 seconds. That's a product quality concern wearing a cost disguise.

In all three cases the work is worth doing, because the product already works. You're optimising something real. That's very different from optimising something hypothetical.

Takeaway

Measure outcomes, not tokens. Ask whether the workflow is useful, whether the ROI is positive against the labour it displaces, whether anyone would miss it if you turned it off. Only when all three answers are yes does it make sense to look at the token bill.

The JIT got better. The boxes got faster. The tokens will get cheaper — they're already an order of magnitude cheaper than they were eighteen months ago. Counting them now, on a product that isn't finished, is the 2026 version of swapping in a StringBuilderthe JIT was about to optimise for you.