Outcomes, Not Activity

June 14, 2026

Measure time-to-evidence and what actually changed — not how much was produced. When agents make activity nearly free, every activity metric becomes a lie, and the only honest unit of account left is the outcome.

There is a particular sound an institution makes when it is dying without knowing it. It is the sound of busyness: reports landing, meetings convening, decks circulating, dashboards greening, headcount climbing, budgets clearing. From the inside, it feels like vitality. From the outside — from the vantage of the citizen waiting for a permit, the patient waiting for a diagnosis, the founder waiting for a decision — it feels like nothing is happening at all. Both views are correct. The institution is intensely active and producing almost no value. It has confused the evidence of effort with the evidence of effect, and it has built its entire measurement apparatus on the wrong one.

This is the oldest pathology in organized human life, and it is about to become catastrophic. For most of history, activity was a defensible proxy for outcome. Writing a report took real cognition and real time; if you saw a hundred reports, you could reasonably infer that a hundred problems had been thought about. The proxy held because production was expensive. Effort was scarce, so counting effort told you something. That entire epistemic foundation is now collapsing. When an agent can draft the report, hold the meeting, populate the dashboard, and produce the document at near-zero marginal cost, activity metrics do not merely lose accuracy — they invert. They begin to measure the wrong thing with perfect precision. The institution that measures activity in an agentic world is using a thermometer to weigh itself.

The metrics problem, and why it is about to detonate

Start with the cleanest statement of the disease. Productivity, as ENSI argues in Productivity of Work: How to Analyze, "is a word so overused it has lost its sharpness. In most institutional settings, it's reduced to a vague sense of busyness, a metric for how many boxes have been ticked or how many documents have been shuffled into digital folders. But this is theater." The essay's central claim is the spine of this whole principle: "Productivity without feedback is just output. Productivity without consequence is just rehearsal." Output and rehearsal are exactly what activity metrics reward. They count the rehearsal and pay the actor for showing up to the theater.

What makes activity metrics so seductive is that they are legible — easy to count, easy to compare, easy to defend in a budget meeting. Outcomes are illegible — they arrive late, they are entangled with confounders, they are hard to attribute, and they sometimes embarrass the people who promised them. So institutions do the rational-but-ruinous thing: they optimize for the legible proxy. A research lab counts publications because counting discoveries is hard. A government counts programs launched because counting problems solved is hard. A consultancy counts slides because counting client outcomes would expose how few there are. The proxy is chosen not because it is true but because it is available. And then the proxy quietly becomes the target.

There is a second, subtler mechanism worth naming, because it is the reason activity metrics feel virtuous even as they corrode. Activity is self-reporting and immediate; outcome is external and delayed. A person who writes a report knows instantly that they have written it and can claim the credit today. A person who waits to see whether the report changed anything must defer their reward, tolerate ambiguity, and accept that the answer might be "no." Human and institutional incentives both run on the short clock. So the activity metric does not merely measure the wrong thing — it pays out faster than the right thing, and faster reward beats truer reward in almost every system humans have ever built. This is why activity metrics survive every reform that does not also fix the timing of the payoff. You cannot beat a fast lie with a slow truth unless you make the truth arrive faster, which is exactly what time-to-evidence is designed to do.

That substitution has a name, and it is the master key to the whole problem.

Goodhart's law, weaponized by abundance

When a measure becomes a target, it ceases to be a good measure. Goodhart's law has always been a tax on bad metrics. What is new is that agents turn the tax into a wipeout. The reason is mechanical: the cost of producing the signal an activity metric tracks has fallen toward zero, while the cost of producing the underlying value has not. The two were always loosely coupled; now they are decoupled entirely.

Consider how this plays through the structure of work itself. ENSI's Value Creation Ontology of Tasks decomposes knowledge work into a sequence of phases — Evidence, Comparison, Analysis, Solution, Formulation — and shows precisely which sub-tasks AI now performs at scale. In the Evidence phase, the ontology notes, "AI systems are increasingly adept at performing these tasks, especially in terms of automation and consistency"; in Comparison, "automated tools can process large volumes of information quickly." Collecting, reporting, screening, filtering, scoring — the very acts that activity metrics count — are now the cheapest things an organization does. If you reward an agent-equipped team for "reports produced," you will get an infinite supply of reports and no greater supply of the thing reports were supposed to enable: a better decision. The metric will go vertical while the world stays flat.

This is Goodhart's law detonating under conditions of abundance. Every activity metric is a bottle, and agents have just made the liquid free. You will fill every bottle. You will fill bottles you forgot you owned. And the fill rate will tell you nothing, because filling is no longer the constraint. The constraint has migrated upstream — to whether the report was worth producing at all, to whether anyone changed their mind because of it, to whether reality moved.

ENSI makes exactly this argument about where the bottleneck goes. In Democracy Engineering: Citizen Productivity Drivers, the analysis is blunt: "In the agentic era, where machines execute at scale and humans increasingly govern goals, constraints, and rule systems, the bottleneck shifts upstream. Execution becomes cheaper; framing becomes decisive." When execution is free, counting execution is the most expensive mistake an institution can make. Worse: "If the human layer that sets objectives is distorted, automated systems will amplify those distortions with ruthless efficiency." Point an agent fleet at an activity metric and you have not automated value creation — you have automated Goodharting. You have built a machine for manufacturing the appearance of progress at industrial scale.

Productivity is not value, and value is not volume

The deepest confusion underneath activity metrics is the equation of production with value. They are not the same quantity; they are barely the same kind of thing. Production is a flow you can watch. Value is a change in the world that may not be visible for months and may be invisible even then. ENSI's Technological Republic: Growth — From Volume to Value draws the line in exactly these terms, contrasting the "Old Paradigm" — "Growth = GDP increase, VC returns, number of startups… Output-driven, not outcome-driven" — against a "New Paradigm" optimized for "durability, sovereignty, public impact, and moral alignment." The essay's framing is uncompromising: growth "is not defined by what is accumulated — but by what is enabled, protected, and passed on." A society, like an institution, can accumulate enormous volume while enabling almost nothing.

This is why the famous Palantir lesson recounted in that essay matters here. The value of pushing software to the U.S. military in Afghanistan "stemmed not from fancy dashboards, but from helping field units and commanders make better real-time decisions. Institutional intelligence came from pushing software into the operational core, where feedback mattered." Dashboards are activity. Better decisions are outcome. The dashboard count would have looked identical whether or not a single decision improved. Only the outcome — the decision, the consequence at the edge — carried the value, and only proximity to consequence revealed it.

The same essay names the antidote directly under "Growth as Feedback-Driven Design": "Progress must be measurable, correctable, and iterative. Growth that can't respond to failure is fragility in disguise." Read that as a sentence about metrics. An organization that measures activity cannot respond to failure, because activity metrics never report failure — a useless report and a world-changing report both register as "+1 report." The metric is structurally incapable of distinguishing success from waste. It is fragility wearing the costume of momentum.

The point generalizes past the firm and into the economy. ENSI's AI-Driven Economy: The Drivers Impact Calculation insists that the impact of AI be calculated through its effect on real drivers of value rather than through gross measures of how much activity the technology generates — a discipline that matters precisely because agentic tools make raw output explode. An economy that congratulates itself on the volume of AI-generated work has measured the input cost of the new abundance, not the value it created. The same trap, scaled to a continent: more activity, counted with pride, signifying less and less. The honest question is never "how much did the machines produce?" — they will always produce more next quarter — but "which drivers of real value moved, and by how much, and how soon did we know?"

Time-to-evidence as the north-star metric

If activity is the wrong unit and outcome is the right one, the practical problem remains: outcomes are slow and illegible. How do you steer a system in real time on a signal that may not resolve for a year? The answer is to measure not the outcome itself but the velocity at which you generate trustworthy signal about the outcome. Call it time-to-evidence: the elapsed time between forming a belief about what will create value and obtaining real-world data that confirms or kills it.

Time-to-evidence is the honest north star precisely because agents cannot fake it. They can fabricate activity infinitely, but they cannot fabricate reality's response. ENSI's productivity framework already isolates this as the core variable under a different name — "Decision-to-Insight Latency," defined as "the time delay between when an insight emerges and when it's converted into an action or strategic shift," with the devastating gloss that "the latency of insight conversion determines whether an idea lives long enough to matter." The companion measure, "Feedback Loop Sharpness" — "how quickly and precisely systems deliver responses to actions" — completes it. The startup's structural advantage over the academy, the essay shows, is almost entirely a latency advantage: feedback in one day versus sixty, insight-to-decision in two hours versus ninety-six. Nothing about the quality of the people differs. Everything about the speed of contact with consequence does.

This is also why ENSI insists that genuine productivity requires "epistemic skin in the game" — "how personally exposed an individual or group is to the consequences of being wrong." Time-to-evidence operationalizes skin in the game as a metric. It asks: how fast does this institution find out it was wrong? An institution with high time-to-evidence is one that can be wrong for years without noticing — the exact condition under which activity metrics flourish and value dies. The task ontology's very first phase is literally named "Evidence," because evidence-gathering is where value creation begins; an institution that is slow to evidence has not started the work, no matter how many documents it has produced.

The agentic enterprise makes this concrete. ENSI's Company as Agentic Workflow reframes the entire organization as an experimentation engine in which "hypotheses are the atoms of learning" and "every major advantage is downstream of an experimentation loop: Generate variants. Run controlled tests. Measure impact with guardrails. Learn and iterate." The essay's pivotal observation is that agents "do more than speed up iteration; they change what iteration is" — turning the company into "a living program: continuously rewritten by evidence." In that world the metric that matters is not how many variants you generated (agents make that free) but how fast each variant earned a verdict from reality. Hypotheses tested per quarter, weighted by the reliability of the test, is a time-to-evidence metric. It is uncheatable in a way that "experiments launched" is not, because launching is activity and resolving is outcome.

Redesigning institutional measurement around outcomes

So how do you rebuild measurement so that it tracks the world rather than the work? Four moves follow from everything above.

First, instrument consequence, not production. ENSI's Civilization Metrics: Perspectives on Civilizational Components is an entire taxonomy built to resist GDP-style volume counting, organizing measurement around dynamic-systems properties like the "Adaptive Capacity Score," "Systemic Resilience Metric," and "Rate of Innovation Diffusion" — measures of how the system behaves under change, not how much it emits. The lesson for any institution: replace "documents produced" with "decisions changed," "meetings held" with "disputes resolved," "programs launched" with "problems measurably reduced." Each replacement swaps a production count for a consequence count. Each is harder to game because each requires reality to cooperate.

Second, shorten the loop deliberately, because measurement quality is downstream of feedback speed. Democracy Engineering treats "Reality Contact" — "the frequency and intensity with which a person engages directly with real-world constraints, consequences, users, failures" — as a first-class driver, warning that "without reality contact, contribution becomes ideological, speculative, or performative." Performative is the native register of activity metrics. The cure is structural: shorten the distance between decision and impact until evidence arrives fast enough to steer by. An institution measuring time-to-evidence will naturally redesign itself to lower it, the way a startup does — and lowering it is itself the reform.

Third, defend the metric against its own gaming. Because any outcome metric, once it becomes a target, invites a new round of Goodharting, Company as Agentic Workflow builds guardrails into the unit of measurement itself: every hypothesis must define a "primary metric + guardrails + stopping rule," and agents are tasked to "prevent 'local metric wins' that harm the system." Outcome measurement is not a single number; it is a primary signal plus a set of guardrails that catch the system optimizing the signal at the expense of the thing the signal was a proxy for. This is Goodhart's law turned into an engineering discipline rather than a lament.

Fourth, accept that this is a governance problem, not a dashboard problem. ENSI's Decision Intelligence Canvas frames the institution as a decision-producing system whose quality is judged by the quality and speed of decisions, not the volume of analysis feeding them — and ENSI's broader civilizational argument, in Civilization Stack: The Framework for AI Age, is that the institutions holding society together must be re-architected, not merely re-instrumented. You cannot bolt an outcome metric onto an organization whose incentives, careers, and budgets all reward activity. The metric will be metabolized and excreted as more activity. The measurement reform and the incentive reform are the same reform.

There is a way to test whether a candidate metric belongs to the activity family or the outcome family, and it is worth making explicit because it is the operational core of this principle. Ask of any proposed metric: could an agent inflate this to infinity without making the world one degree better? If yes — reports, meetings, lines of code, tickets closed, models trained, pages published — it is an activity metric and will betray you the moment the agents arrive, if it has not betrayed you already. If no — because the metric is anchored to a change in an external state the institution does not control, a decision a real person reversed, a cost a real citizen no longer bears, a problem a real user no longer has — then it is an outcome metric, and it will keep its meaning no matter how cheap production becomes. The test is not subtle, but most institutions have never once applied it to their own dashboards. They have simply assumed that because a number is rising, something good is happening. In the agentic era that assumption is not merely naive; it is the specific cognitive failure that will hollow out the institutions least willing to question it.

The honest ledger

Here is the uncomfortable synthesis. For a generation, "do more" was a coherent instruction because doing was hard and doing more usually meant achieving more. Agents have severed that link permanently. In a world where activity is free, the institution that keeps measuring activity is not neutral — it is actively deceiving itself, and at machine speed. It will generate the most beautiful productivity charts in its history during the precise period its real value collapses, because its instruments are calibrated to the one quantity that no longer means anything.

The way out is not more dashboards. It is a change in the unit of account. Measure what changed in the world. Measure how fast you found out whether you were right. Treat time-to-evidence as the north star and consequence as the only currency. As ENSI puts it in the productivity essay, the goal is to understand "what makes work matter — and what makes thinking productive. Because if we can understand that, we don't just make better careers. We build better minds, better institutions, and ultimately, better civilizations." An institution that measures outcomes is forced to confront reality continuously, and an institution in continuous contact with reality is the only kind that learns fast enough to deserve the agents it is about to be handed. Everything else is rehearsal — performed, now, by tireless machines, for an empty house.

Outcomes, Not Activity

The metrics problem, and why it is about to detonate

Goodhart's law, weaponized by abundance

Productivity is not value, and value is not volume

Time-to-evidence as the north-star metric

Redesigning institutional measurement around outcomes

The honest ledger

Further reading

Recent Posts

Subscribe to our Newsletter

Outcomes, Not Activity

The metrics problem, and why it is about to detonate

Goodhart's law, weaponized by abundance

Productivity is not value, and value is not volume

Time-to-evidence as the north-star metric

Redesigning institutional measurement around outcomes

The honest ledger

Further reading

The Whole Stack

Recent Posts

Follow Us

Subscribe to our Newsletter