
“If that $500,000 engineer did not consume at least $250,000 with tokens, I am going to be deeply alarmed.” Jensen Huang, CEO of NVIDIA
That is a striking statement, not only because of its scale, but because it suggests a new operating logic: AI spend is becoming a proxy for engineering performance.
Tools get bought. Tokens get burned. Budgets rise. It sounds rational, responsible, even visionary. But something about it feels off.

"How much of the money flowing into AI coding agents is driven by a clear hypothesis about value and productivity, and how much is driven by hype, peer pressure, and FOMO?"
I call this Vibe Spending. And in many engineering organizations I speak with, it has become the default mode of AI tooling decisions.
One disclaimer before we go further: this is not another post about predicting the future of software engineering. I have opinions, but they are no better than yours. This is simply my current read on what is happening in the market, and what seems to be driving it.
1. The Default Is Yes
The first pattern I see is simple: once someone asks to buy an AI coding tool, it becomes very hard to say no. There are a few reasons for that:
These tools are actually good
Some of these coding agents are remarkably good. Not marketing good, actually good.
I have seen engineers build impressive UIs in a few hours, debug a 300-line legacy function in under five minutes, and generate a full suite of tests for a module they had never seen before, all with the help of an AI coding agent. Before dismissing this as hype, ask a senior engineer on your team to do the same tasks without AI and measure the difference. The delta is real.
That does not mean the tools are magical, or that every workflow benefits equally. But the gains are tangible enough that saying “this is all overblown” is no longer a serious position.
They keep getting better
Unlike most software categories, where improvements are incremental and often barely noticeable, AI coding agents have been improving at a pace that is difficult to ignore.
Eighteen months ago, the best models would often produce code that looked plausible but did not compile. Today, coding agents that use frontier models can handle much more complex tasks: reasoning through tradeoffs, working across large codebases, and producing code that a senior engineer would be proud to ship.
By the time you finish reading this post, there is a reasonable chance that a new model has been released that outperforms everything that existed when I started writing it.
The 10x engineer dream
The idea of the 10x engineer long predates AI coding agents. I even wrote about it ~6 years ago.
AI coding agents promise that they can turn your average engineer into a 10x engineer. This is partly true, partly false, and mostly complicated.
What I do know is that the promise is compelling enough that very few leaders want to be the person who said no to the tool that might have transformed team productivity. The fear of being wrong in the conservative direction is stronger than the fear of being wrong in the permissive one.
Pressure comes from both directions
The pressure to increase the usage of AI coding agents doesn’t come from one direction; it comes from both. From the bottom, your engineers are asking for better tools. They see what their friends at other companies are using. They read the benchmark posts. From the top, your CEO/Board is saying/asking things like:
I heard that <company_name> has increased engineering productivity by 53% using <coding_agent_name>. What are our numbers?
When pressure comes from both directions simultaneously, the path of least resistance is to say yes.
And that is the key point: in most software buying decisions, the burden is on the champion to prove the tool is worth the money. With AI coding agents, the burden is increasingly on the skeptic to explain why the company should wait.
2. Budget Never Holds
The second pattern I see is that once adoption begins, it becomes harder than expected to stay within the originally allocated budget.
The reason is simple: AI coding agent spending does not behave like a normal software budget. It is not just a matter of counting seats and multiplying by a monthly price. Costs expand across tools, models, workflows, automation layers, and entirely new use cases. What looks manageable in the planning and approval phases often becomes much harder to control in production. Here is why:
Pricing models are complex
AI coding agent pricing is difficult to understand and even harder to forecast.
Input tokens are priced differently from output tokens. Cached tokens are priced differently from uncached ones. Different models from the same provider can have materially different price points. Some tools charge per seat, others per token, others per request, and some combine all three. Even with strong usage visibility, translating it into a predictable monthly budget is not straightforward.
The problem gets worse because engineers rarely use just one tool. They may use Cursor or Windsurf as their primary IDE, Claude Code or Codex for harder tasks, CodeRabbit for code review, Mermaid AI for diagramming, and DataDog Bits AI SRE for incident response. Each product comes with its own pricing model, token budget, and billing cycle. Multiply that stack across an engineering team, and the “$200 per engineer per month” budget you approved is not going to cut it.
Then there is model choice. As better models become available, cost tends to migrate upward. Sometimes a more capable model is also more efficient, but usually a smarter model means more tokens. In most organizations, the decision at the point of use is not driven by cost. Engineers optimize for speed and quality. Very few stop mid-flow to ask whether this task really requires the most expensive model tier available.
In theory, model routing should solve part of this by matching task complexity to the right cost-performance tier. In practice, most engineering organizations have not implemented it well, and many individual users do not think about it at all.
Usage per task expands faster than expected
Even when the seat count stays flat, the cost per task often rises much faster than you expect.
Reasoning models are a good example. When an engineer asks a model to work through a hard architectural question, the model may spend a large volume of tokens reasoning before producing any visible output. On difficult tasks, the cost of a single query can be an order of magnitude higher than a standard interaction. Engineers usually like the result. Finance usually likes it less.
Larger context windows introduce a similar dynamic. They are super useful. Feeding in a large codebase, a full PR diff, or years of technical decisions can materially improve output quality. But more context means more input tokens, and the temptation to include everything “just in case” is hard to resist.
Agentic loops push this even further. In a simple chat interface, token usage is roughly tied to the conversation. In agentic workflows, one high-level instruction can trigger dozens or hundreds of observe-reason-act-test cycles before the system completes the task. As users become more comfortable with reducing human checkpoints, agents run longer and more independently. I have seen a single “refactor this module” request trigger more than 10 loops before the agent considered the work complete. The output was excellent. The token bill was also excellent.
Tool use compounds the problem. Modern coding agents are increasingly connected to external systems through protocols like MCP. A request that looks like a simple prompt on the surface may actually trigger a long chain of searches, documentation fetches, issue lookups, code inspections, and API calls underneath. What the engineer experiences as one task may be fifty billable operations.
Sub-agents extend the same logic into parallelism. Ask an orchestration framework to migrate a codebase from one SDK to another, and it might spin up one agent to update the backend, another to adapt the frontend components, a third to rewrite tests, and a fourth to update internal documentation. That kind of decomposition can compress a migration that would normally take days into a much shorter window. It can also multiply token consumption just as quickly, because each workstream is reasoning, reading context, and generating output independently.
Parallel prompting takes the pattern even further. Instead of asking one strong model, the user asks several in parallel and either chooses the best answer directly or uses another model to judge them. There are real use cases for this, especially when the decision is high-stakes and hard to reverse. But when it becomes a default habit on routine tasks, costs can scale far faster than you realize.
Consumption is no longer bounded by the workday
Not long ago, AI coding usage mostly happened when an engineer was at their desk, actively working. That constraint is disappearing.
Engineers can now launch an agent, leave the office, and continue supervising it from a phone or messaging interface. They can approve steps, redirect the task, and keep the workflow running long after the formal workday has ended. That is useful, and in many cases exactly how the technology should be used. But it also means token consumption is no longer naturally bounded by working hours or physical presence.
The same thing happens once execution becomes scheduled or event-driven. If AI systems are reviewing pull requests, responding to monitoring alerts, summarizing metrics, or handling operational workflows on a trigger, spend no longer maps cleanly to any engineer’s visible activity. Your AI tools can be consuming tokens at 3 AM on a Saturday, and no one may notice until the invoice arrives.
Demand expands beyond the original use case
The final issue is that once engineers get access to a capable coding agent, they do not use it only for coding.
They use it to write technical specs, summarize architecture discussions, draft stakeholder updates, prepare for performance conversations, and generally offload a wider set of knowledge work. This may be good for the business. In some cases, it may be very good for the business. But it is rarely what the original budget assumed.
That is the broader pattern: AI budgets rarely break because of one dramatic decision. They break because usage spreads in every direction at once. More tools, more models, more autonomy, more background execution, and more use cases. The approved budget feels fixed. The actual exposure is variable.
3. ROI Is Hard to Prove
The third pattern I see is that AI coding agents do not just change engineering output. They also make productivity harder to measure and ROI harder to prove.
You might argue that engineering productivity was always difficult to measure, and that is true. But AI makes it harder. A growing body of research suggests that the relationship between AI usage and engineering productivity is neither linear nor easy to isolate. That should not be surprising. The closer you get to real business value, the harder causality becomes.
More output is not the same as more value
The industry’s most established developer productivity frameworks, including DORA and SPACE, are useful, but they were not designed to isolate the effects of AI assistance.
That matters because AI can dramatically increase output. One engineer may suddenly produce far more commits, open more pull requests, or ship more code in a given week simply because an agentic workflow makes those actions easier. But more output is not the same as more value. More PRs do not necessarily mean faster feature delivery, better customer outcomes, or higher-quality systems.
I have seen teams where AI adoption clearly increased commit velocity while also increasing the amount of time senior engineers spent reviewing, correcting, and cleaning up AI-generated work. From a dashboard perspective, output went up. From an organizational perspective, the picture was much less clear.
Quality signals arrive late
Quality metrics seem like a better signal, but they come with their own problems.
First, there is a significant lag; a bug introduced into the codebase today might not cause a production incident for months, by which time it is nearly impossible to trace back to its origin.
Second, causality is almost always ambiguous. If incidents go up after AI adoption, is that because of AI-generated code, because of faster shipping velocity, or because of organizational changes that happened at the same time? Separating these signals is extremely hard in a real engineering org.
Business impact is the right metric and the hardest one
The ultimate measure of whether AI coding tools are working is whether they are helping you build better products faster and translating that into business outcomes. This is the right question and also the hardest one to answer.
There are too many variables between “engineer uses AI to write code” and “company grows revenue”. Feature quality, go-to-market execution, and competitive dynamics all sit between the tool and the outcome.
Part of the challenge is that AI may accelerate tasks that were never the real bottleneck. If code gets written faster but prioritization, review, rollout, or adoption does not improve, local productivity may rise without a comparable gain in end-to-end business output.
Value is not evenly distributed across tasks
AI coding tools do not create the same value across all types of engineering work.
They tend to perform best on tasks where the objective is relatively clear, and the output is easy to verify: greenfield implementation, test generation, UI scaffolding, refactors with tight constraints. They are less reliable in mature systems shaped by years of undocumented decisions, legacy constraints, and subtle business logic.
The frontend-backend distinction matters too. Frontend mistakes are often visible quickly. Backend mistakes can look perfectly reasonable until they create a production issue much later. Without understanding the actual distribution of tasks across a team, any aggregate productivity number risks saying very little.
Value is not evenly distributed across people
Usage across people is not evenly distributed either.
When we increased the AI budget per engineer, we expected usage to spread roughly evenly across the team. It did not.
A relatively small group of engineers consumed most of the tokens. Some burned through a full monthly budget in a few days and immediately asked for more. Others barely used the tools.
Were the heavy users the most productive people on the team? Were they getting disproportionate value? Or were they simply the most enthusiastic adopters? In many organizations, the honest answer is that no one really knows.
That uncertainty matters. Not because the tools have no value, but because the economic case is often much less measurable than the confidence with which the budget gets approved. That, again, is what I mean by Vibe Spending.
4. Future Budgeting Must Change
The fourth pattern is that AI spending often escapes the budgeting discipline that organizations apply to almost everything else.
When we buy a developer tool, we usually have a thesis, an owner, and a budget. When we hire an engineer, we usually have a thesis, an owner, and a budget. AI often slips past that level of scrutiny.
The spending starts in one category, behaves like another, and eventually forces a much larger allocation conversation than anyone planned for.
It starts in the tools budget
The process usually begins innocently enough.
An AI coding assistant gets added to the existing tools budget at something like $20 per engineer per month. At that price, it looks like just another SaaS subscription. The budget committee approves it without much friction because it fits neatly inside an existing line item, and the organization gets to feel appropriately forward-looking.
Then the pressure builds. Engineers hit token limits. Team leads ask for access to better tools, and engineers start using smarter models. Executives start asking what the company is doing about AI after hearing aggressive productivity claims from peers and competitors.
So the budget gets raised. What started at $20 becomes $200 per engineer per month, often justified by some version of a 10x productivity narrative. Even then, it still feels like a tooling decision, because it remains buried inside the same budget bucket.
The problem is that the new number often turns out not to be a ceiling, but a floor. Within weeks, some engineers have consumed their full monthly budget in a matter of days and are asking for more. Others barely use the tools at all. The average may still look manageable. The distribution does not.
Then the tools budget starts expanding around it
There is a theory that AI coding agents will consolidate software spend by replacing a long tail of specialized tools. Some analysts and investors describe this as a coming SaaS apocalypse.
In practice, at least for now, the opposite is happening. The rest of your tools budget is expanding alongside the AI spend, not shrinking because of it.
Tools like Figma, Miro, Notion, Jira, DataDog, Snowflake, and GitHub are all seeing increased consumption. More seats, more usage, higher bills. The reason is straightforward: MCP servers have turned every engineer into a potential power user of tools that used to be limited to specialists.
Suddenly, everyone in the org is connecting their coding agent to Figma and doing UI design work they never touched before. That may be great for execution. It may even be exactly the right organizational outcome. But it means AI is not simply replacing software spend. In many cases, it is amplifying it.
Eventually, it becomes a payroll question
Here is the conversation that nobody wants to have, but that every engineering leader will eventually need to have.
If AI coding agents are genuinely making engineers more productive, then what exactly is the organization expecting to do with that additional capacity? Ship more with the same team? Slow hiring? Reallocate headcount into new areas? Raise the output bar? Change the mix of senior and junior talent?
This is the point where the budget conversation stops being about tools and becomes a team design conversation.
That is a much more sensitive conversation. Tools budgets can be increased or cut with relatively little organizational impact. Payroll decisions affect org structure, trust, and ultimately people’s livelihoods. The orgs that handle this thoughtfully are the ones that had a clear answer to the question they should have asked at the beginning:
what exactly were we expecting these coding agents to do for us?
Conclusion
AI coding agents are powerful, and the pressure to adopt them, from your engineers, from your CEO, from the market, is real and legitimate. I am not arguing against adoption. I am arguing against unlimited spending without a value hypothesis and a plan.
Here is a simple test: before your next AI tooling purchase, write down in one sentence what you expect the tool to change, how you will measure that change, and when you will review it. If you cannot write that sentence, you are Vibe Spending.
The teams that will get the most out of AI coding agents are not the ones with the largest budgets. They are the ones that treat AI tooling with the same rigor they apply to every other strategic investment: a clear hypothesis, a way to measure it, and the discipline to act on what they find.
Vibe coding has its place. Vibe spending, at the scale of a large engineering org, is just a very expensive way to feel like you are winning.
Originally shared on Ofer Karp's LinkedIn. View the original post here.