AI Value Is a Measurement Problem. Boards Are Treating It as a Headcount Decision.

Name: Hivel - Software Engineering Productivity Tool
Brand: Hivel
Rating: 4.8 (70 reviews)

min

Content

Introduction

The number that reset the conversation

Why the CFO is genuinely stuck

What the board is actually missing

The uncomfortable part

Here is a conversation happening in boardrooms right now, almost word for word.

The mandate came down clear. Cut some percentage of the team, let AI absorb the work, and let the savings fund both the AI bill and prove the transformation is real. Clean story for the next board meeting.

Then the leaders running it go quiet. Because they can see what the mandate did not account for. AI has not yet added enough value to justify cutting those people. The tools are real and the gains are real in places, but the proof that would make the cut safe does not exist. So now they hold two instructions that contradict each other: show higher AI investment, and show the headcount reduction that AI was supposed to make possible.

That contradiction is the real story. And it is a board-level problem, not an engineering one.

The number that reset the conversation

In July 2025, MIT's Project NANDA published "The GenAI Divide: State of AI in Business 2025." The headline finding: despite $30 to 40 billion in enterprise spending on generative AI, 95% of organizations are seeing no measurable return. The study was built on 52 executive interviews, surveys of 153 leaders, and analysis of 300 public AI deployments, and the authors are careful to call it a directionally accurate snapshot rather than a final verdict.

The lazy read is "AI doesn't work." That read is wrong, and it is the one that gets people fired. MIT's own conclusion is that the core issue is not the quality of the AI models, but the learning gap for both tools and organizations, the failure to integrate AI into how work actually happens. The CFO-grade reframe that has emerged on top of the study is sharper still: AI ROI failures are usually measurement failures, not technology failures. The value is usually unproven rather than absent.

Unproven and absent look identical on a board slide. That is the entire problem. When you cannot measure the value, you cannot distinguish "this is working and we can't see it" from "this isn't working," and you end up making a workforce decision on a coin flip you've labeled a strategy.

Why the CFO is genuinely stuck

Sit in the CFO's chair. AI spend is climbing and has no natural ceiling, because every seat with an AI tool can consume more this month than last. Enterprise AI platform bills are scaling past a million dollars a month at the high end, and as one Nvidia executive put it bluntly, for his team the cost of compute is far beyond the cost of the employees.

The board wants to see that investment growing, because flat AI spend reads as falling behind. But growing spend with no provable return is not a transformation. It is an exposure waiting for a follow-up question. So the instinct becomes: find the offsetting value, and the most legible offset on a spreadsheet is headcount. Cut people, book the savings, present the savings as the return.

The trap is that the savings are visible on day one and the cost arrives on day ninety. You booked a number you could see and traded away a capability you couldn't. And because nobody measured what those people actually did against what AI can actually do, the trade was made blind.

The story no board wants to repeat

Klarna is the case study that belongs on the table.

In early 2024, the company said its OpenAI-built assistant could do the work of 700 customer service agents, handling 75% of customer chats, about 2.3 million conversations, within a month of launch. It projected the assistant would drive a $40 million profit improvement. On paper, exactly the story the mandate wants.

Then quality fell, complaints rose, and Klarna started hiring humans back. CEO Sebastian Siemiatkowski told Bloomberg, in the line that became the inflection point for the whole category: "As cost unfortunately seems to have been a too predominant evaluation factor when organizing this, what you end up having is lower quality." The lesson sits in one preposition. Klarna did not fail by using AI. It failed by using AI instead of people rather than alongside them.

Klarna is not an outlier, it is the leading edge of a pattern. Orgvue's 2025 survey of more than 1,000 senior leaders found that 39% had made employees redundant as a result of deploying AI, and of those, 55% admit they made the wrong decisions about those redundancies. Gartner now forecasts that by 2027, 50% of companies that attributed headcount reduction to AI will rehire staff to perform similar functions, under different job titles. Fire-and-rehire is not a strategy. It is the most expensive possible way to learn what your people were actually doing.

‍The argument I keep hearing, and where it is only half right

There is a camp that says: never cut people because of AI, because the only real constraint is your own imagination. If you are cutting, you have run out of ideas, not run out of work.

Ethan Mollick is the clearest voice here, and he is mostly right. He frames job loss as a choice leadership will face, and potentially execute badly, and his warning is the line to pin to the wall: "I worry that without imagination, organizations will think automation is the way to go." Automation looks like the only move when you have stopped imagining what a freed-up, AI-leveraged team could build instead.

But the purist version of this is also half wrong, and pretending otherwise insults the room. Some low-end, repetitive work genuinely should be automated. Moving data between systems, processing forms, routing tickets, drafting boilerplate. That is not a creativity opportunity waiting to be unlocked. It is toil, and removing it is good.

The honest data sits between the two camps. AI is automating tasks far faster than it is automating jobs, because of what researchers call the jagged frontier. In the Harvard and BCG field experiment with 758 consultants, those using AI on tasks inside its frontier completed 12.2% more work, 25.1% faster, at significantly higher quality. On a task chosen to sit outside the frontier, consultants using AI were 19% less likely to produce the correct solution, because the model was confidently and plausibly wrong. A fair caveat: that study used 2023-era GPT-4, and reasoning models, longer context, and agentic tools have since moved many tasks from outside the frontier to inside it. But that movement is exactly the point. The line between automatable and not is real, it is invisible, and it shifts under you. Almost every real job is a mix of both kinds of tasks. That is precisely why you can safely automate slices of work and still get burned automating a whole person.

So the resolution is not "never cut" and it is not "cut to fund the bill." It is that you cannot tell which is right without measurement.

What the board is actually missing

Three things sit in three separate systems. What you spent on AI, and on whom. What those people and tools actually produced. And which specific tasks AI is genuinely carrying out vs which it is failing quietly. Until those three are connected, every headcount call is a guess wearing the costume of a financial decision.

This is also why "the tool's own dashboard will tell us" keeps failing. Vendor dashboards report usage, not outcomes. Usage tells you the seat is active. It does not tell you whether output rose faster than spend, or whether the roles on the cut list were doing the routine 80% or the irreplaceable 20%. Measuring an AI rollout by tokens consumed is like measuring a sales team by miles driven. The activity is not the result.

Worth saying plainly, because it cuts against the panic: most of the "AI layoff" narrative is overstated. Gartner's own data shows only about a fifth of customer service leaders have actually reduced staffing due to AI, with most reporting that headcount is steady even as they serve more customers. Which makes the reversals more instructive, not less. The companies that cut hardest on an unmeasured assumption are the ones now quietly rehiring.

The uncomfortable part

The companies that win the next eighteen months will not be the ones that cut fastest or spend the most. They will be the ones who could see, before they cut, which work AI had genuinely absorbed and which work only looked absorbable until the quality scores came back. Measurement is not the unglamorous footnote to the AI story. It is the thing standing between a real productivity gain and a public apology.

Know what AI is doing. Prove what it is worth. Then, and only then, decide what to change.

So here is the question worth putting on your own table before the next board meeting. If someone asked you tomorrow for output per dollar of AI spend, not usage, not licenses, output, could you answer it? And if the answer is no, what exactly is the headcount decision standing on?

Cut creativity, and you will be rehiring by Q3. Cut toil you have actually measured, and you will be one of the five percent.

Also read: AI Metrics That Actually Prove ROI to Your Board

Subscribe to our Newsletter

AI Value Is a Measurement Problem. Boards Are Treating It as a Headcount Decision.

Sudheer Bandaru

Founder, CEO

Sudheer started as a Software developer in Silicon Valley, worked at startups and large corporations like Merrill Lynch, AT&T, Hewlett Packard. Sudheer got into engineering leadership roles at startups that went IPO, led multiple M&As in the US, and managed remote global teams. During his career, there were many instances where he felt that a lack of data-driven culture for continuous improvement of processes led to poor gut-based decisions and costly mistakes. This problem led him to start Hivel which helps engineering teams continuously improve via access to critical metrics using interactive dashboards and actionable insights.