The Real Bill for AI at Work

The invoice is the cheap part. A seat, some tokens, a line item the finance team can read in a second. I work with agents every day, and the number on that invoice has almost nothing to do with what the tools actually cost. The real bill arrives somewhere else, in a currency nobody puts on a purchase order.

You can feel the gap if you have lived inside it, and I have, running a single agent across more than thirty of my own repositories. The tool makes you faster on the thing in front of you, immediately and obviously, and you walk away sure the whole system just got faster too. It usually did not. This is the most consistent finding in the workplace research, and it is strange enough to sit with: in Glean's survey of digital workers, about three quarters said AI makes them more productive, and only about one in eight said their organization was performing significantly better as a result. Denmark ran the cleaner test, linking adoption surveys to actual payroll and hours across thousands of workplaces, and in the first couple of years found no meaningful effect on wages or hours at all. Fast individual, flat organization. The same shape keeps coming back.

That gap is not a mystery once you stop looking at the invoice and start counting what the invoice leaves out. I have come to think of AI cost as four ledgers, not one. Three of them never show up in the budget, and they are where the money actually goes.

The four ledgers

Financial is the visible one. Seats, tokens, infrastructure, the security review, the integration work, the vendor contract. This is the cost everyone argues about and the only one most companies actually track. It is also, in my experience, the least interesting, because cheap tokens do not mean cheap adoption. The economists who study general-purpose technologies have a name for the trap: the modern productivity paradox, where the spending and the reorganizing come now and the visible gains come much later, if you do the reorganizing at all. The expensive part of AI was never the model. It is everything you have to rebuild around it. At my own one-person scale there is almost nothing to rebuild, which is exactly why the tool stays cheap for me. At the scale of a large organization, the rebuild is most of the bill.

Attention is the first hidden one, and it is the one I pay myself, every day. It is the cost of deciding what to ask, loading enough context for the answer to be any good, reading the output closely enough to trust it, catching the place where it went confidently wrong, and steering it back. Glean's index put a number on this that matched my own week uncomfortably well: workers reported spending around six hours a week "botsitting," feeding and checking and correcting the tools, more time than they spent using the tools to produce the actual work. None of that shows up anywhere. It feels like working. It is mostly supervising.

The dangerous part of the attention ledger is that it is easy to miss from the inside, because the felt experience lies to you. A 2026 study found what the authors called a speedup illusion: people expected AI to make them much faster, and reported less effort using it, even on tasks where it did not actually save any time. Less strain is real and worth something. But less strain is not the same as more output, and a tool that feels like a shortcut while quietly costing you the same hour is the easiest expense in the world to under-count. I have learned to distrust the feeling of speed and look at what actually shipped.

Coordination is the second hidden ledger, and it is the one that eats the gains. AI makes you faster at the parts of your job you can change by yourself: the writing, the summarizing, the first draft, the lookup. It does almost nothing for the parts that need other people: the approval, the handoff, the unclear ownership, the review that is already a bottleneck. So the local speedup is real and the system speedup is not, because the work still has to pass through all the same human gates it did before.

Worse, the local speedup can make the system slower. Stanford and BetterUp gave the failure mode a good name, workslop: polished, low-substance AI output that gets passed to a colleague who then has to spend real time figuring out it is hollow and redoing it. Around forty percent of desk workers reported receiving some in the prior month, costing roughly two hours each to resolve. The person who generated it saved ten minutes. The person downstream paid two hours. That cost is invisible to the one who created it, which is exactly why it keeps happening. One person's productivity win is another person's tax, and the tax is larger than the win.

Development is the slowest ledger and the one I worry about most for the people coming up behind me. It is the cost of what the work used to teach. The clearest evidence that AI can teach is also the clearest warning that it can stop teaching: in the big call-center study, AI raised productivity about fourteen percent on average and about thirty-four percent for novices, apparently by transferring the habits of the best workers to the newest ones. That is the good version, compressed apprenticeship, the experience curve folded shorter. But there is a destroyed version of the same mechanism. If a junior uses AI to skip past the messy, unglamorous repetitions, they also skip the thing those repetitions were building under the surface: the taste, the debugging instinct, the sense of when an architecture is wrong before you can say why. A 2025 paper makes the structural case that this is real, that systems which replace valuable human activity can erode the very capacities that activity used to develop.

Which means not all friction is waste. Some friction is training, and a company that strips out every repetition in the name of efficiency can save a great deal of time while hollowing out its own talent pipeline. PwC's barometer is already picking up the shift: AI-exposed entry-level jobs are now far more likely to demand the senior, human-intensive skills, judgment and leadership, earlier than people used to have to supply them. The floor of the career is rising. That is good for the work and brutal for whoever is standing on the old floor.

The skill that runs through all four

There is one skill that decides how heavy each of those ledgers gets, and it is not prompting. It is knowing where the tool is good and where it is quietly bad.

The research has a sharp picture of this. When BCG's consultants used a frontier model on tasks inside its range, they did better and finished about a quarter faster. On a task built to sit just outside that range, the same tool made them roughly nineteen percent more likely to get the answer wrong, because it was just as fluent and confident on the problem it could not actually do. The authors called the boundary a jagged frontier, and the word is right. It is not a clean line between hard and easy. It is jagged, strong in one place and subtly broken half a step over, and from the inside the two feel identical.

So the real skill is judgment under uncertainty: knowing when to delegate, when to steer, when to verify, and when to close the tool and think alone. I do this constantly, mostly without noticing, and the days it goes well are the days I correctly guessed which side of the frontier I was on. The cognitive scientists frame the underlying move as value-based: we offload a piece of thinking when it feels effortful or low-value, weighing the cost of doing it ourselves against the cost of checking what the machine did. Good offloading frees you for higher work. Bad offloading weakens a muscle you still need. Telling them apart, in the moment, on a specific task, is the whole job now.

What the four ledgers are for

I keep coming back to a frame I have written about before, which is that most of the work of getting anything right is making the real state of a system visible so nobody has to go hunting for it. The four ledgers are that move pointed at AI cost. The reason companies buy AI and cannot find the return is not that the return is fake. It is that they are reading one ledger and getting billed on four. The gains are real and local. The costs are real and spread across attention, coordination, and development, where no dashboard is looking.

None of this is an argument to use less AI. I use more of it every year, and it has changed how the work actually feels. It is an argument to stop treating AI as a cheap answer engine and start treating it as a cognitive prosthetic, something that can genuinely extend you and can also, used badly, weaken the body around it. A prosthetic is worth a great deal. It is also not free in the way a subscription is free once you have paid for it.

The mature version of the question is not how much AI your people should use. It is which parts of human thinking you want to make cheaper, which parts you want to protect, and which parts you want to grow. The companies that win will not be the ones that buy the most. They will be the ones that redesign the work while keeping alive the human capacities that made the work worth doing in the first place. That decision does not live on the invoice. It never did.