Theory of Change

How we turn the frontier of artificial intelligence into the critical decision-making infrastructure of a thriving civilization — the change we exist to create, and the path we believe gets us there.

How to read this document

A theory of change is not a mission statement with numbers stapled to it. It is a falsifiable causal argument: a chain of claims, each of the form if we do this, then that will follow, because of this mechanism — and here is the assumption that has to hold, and here is how we would know we were wrong. We hold ourselves to that standard because it is the same standard we hold everyone else to. Our principle Outcomes, Not Activity says an institution should be judged by the change it causes, not the work it performs; Provenance on Every Claim says no assertion travels without its evidence. A theory of change that cannot be falsified would violate both. So this document is written to be argued with. Where a link in the chain is weak, we say so. Where a number is not yet measured, we say "to be measured" rather than inventing one.

It is also written the way we write everything — by the six methods that govern the rest of our work: stripping the problem back to what is actually true (first-principles thinking), braiding fields that rarely talk to each other (multidisciplinary synthesis), naming the load-bearing assumption everyone shares and asking what follows if it is wrong (contrarian reframing), anchoring claims in real deployments rather than what sounds right (evidence over anecdote), reasoning about whole systems and the layers they sit in (systems and structure), and meeting the strongest version of every objection head-on (steelmanned honesty). The theory below is shaped by those methods, not decorated with them.

1. The problem we exist to solve: a throughput crisis in collective intelligence

Europe does not have an AI awareness problem. By 2024 every board, ministry, and university rectorate had discussed AI; the tools themselves — frontier models, copilots, retrieval systems, agents — are abundant, cheap, and a single procurement decision away. The gap is downstream of awareness and downstream of access. It is the distance between a cognitive capability existing in the world and an institution actually converting it into a better decision — a wiser policy, a faster diagnosis, a sounder investment, a fairer vote.

We call the binding constraint by its real name. Following our first principle, Bandwidth, Not Will: the limiting factor on good collective decisions was almost never a shortage of good intentions or good character. It was throughput — the sheer cognitive bandwidth required to read everything relevant, weigh it honestly, and decide well, at the scale and speed modern institutions demand. For the first time in history, that throughput is buildable. A think tank that treats the problem as a deficit of will (more training, more exhortation, more strategy decks) will fail, because the deficit was never will. It was bandwidth, and bandwidth is now an engineering problem.

Consider the shape of the crisis concretely:

A national ministry runs a public consultation and receives 40,000 written responses. Under the old constraint, a handful of officials skim a sample, and 95% of what citizens actually said never reaches the decision. The will to listen was present; the bandwidth was not.
A parliament votes on an omnibus bill of 1,200 pages that no single member has read in full. The legitimacy of the vote rests on a fiction. This is precisely the failure our essay on Evidence Before the Vote dissects: a legislature that cannot read what it decides on is deciding on faith, not evidence.
A hospital trust holds, in its records, the answer to which of two treatment pathways serves its population better — but no clinician has the hours to extract it, so the question goes unasked and the worse pathway persists for a decade.
A science funder receives 2,000 grant applications, assigns three overworked reviewers each, and allocates a billion euros through a process whose noise has been measured, in study after study, to rival its signal — the problem our work on decentralized science and the artificial scientist takes head-on.

In every case the institution is not lazy, corrupt, or stupid. It is bandwidth-bound. The translation gap — between capability and decision — has, in our diagnosis, five roots, and naming them is what keeps our interventions from being arbitrary:

Absorptive capacity. Most institutions — especially the SMEs and mid-sized public bodies that make up the bulk of European life — lack the internal skill to specify, deploy, and trust an AI system against a real decision. The bottleneck is organisational, not technological.
Decision architecture. Value is realised only when intelligence is wired into how an institution actually decides — who sees what, when, and with what authority. Most organisations bolt tools onto unchanged workflows and are then surprised that nothing changes. Re-engineering the decision, not the tool, is the work.
The ownership question. As we argue in Intelligence Is Infrastructure, institutions have slotted the most important general-purpose capability since electrification into the accounting category of office software — renting cognition by the seat rather than building, owning, and governing it like roads, grids, and clean water. Rented cognition cannot be inspected, cannot be guaranteed, and cannot be trusted near a constitutional decision.
Trust, provenance, and governance. Under the EU AI Act and sectoral regulation, no serious leader will put AI near a consequential decision until provenance and accountability are legible. Absent that, adoption stalls at the demo stage. Our insistence on Provenance on Every Claim is not an aesthetic preference; it is the precondition for adoption.
The is/ought confusion. The deepest failure is conceptual. Institutions either refuse AI entirely (fearing it will make value judgements machines have no business making) or surrender to it entirely (letting a model launder a political choice as a technical output). Our principle The Machine Does the Is, the Human the Ought draws the line: machines marshal evidence about what is; humans retain authority over what ought to be. Most failed AI-in-government projects died because they blurred it.

ENSI — the European Nexus for Strategic Intelligence — exists to close this translation gap across the institutions that run a civilization: firms, yes, but also governments, parliaments, courts, hospitals, universities, and science funders. The subtitle of this site is not rhetoric. We are trying to turn the frontier of artificial intelligence into the critical decision-making infrastructure of a thriving civilization. Everything below is the causal argument for how a small institute moves a problem that large.

2. What we mean by "strategic intelligence" — and how we measure it

Our headline construct cannot be a slogan, or the whole theory rests on sand. We define strategic intelligence as an institution's capacity to convert information into good decisions under uncertainty — and, for public institutions, to do so legitimately. We hold ourselves to four measurable dimensions:

Decision quality. The share of consequential decisions that, audited after the fact, used the best available evidence and reasoning. Measured by structured decision audits and by calibration scoring — did the decision-makers' stated probabilities match observed outcomes? A treasury that says "70% likely" and is right 70% of the time is well-calibrated; most are not, and the gap is measurable.
Decision velocity. Time from question to committed decision on a defined class of recurring decisions — a planning-permission ruling, a credit decision, a market-entry assessment, a clinical-pathway choice. Velocity matters because a correct decision delivered too late is a wrong decision.
Problem-solving throughput. The volume of non-routine problems an institution resolves per quarter without adding headcount — the direct operationalisation of Bandwidth, Not Will.
Legitimacy. For public decisions, quality and speed are not enough; the decision must be trusted. We measure whether affected parties can see the evidence behind a ruling and trace it to source — the operational test set by Evidence Before the Vote and Provenance on Every Claim.

If an ENSI intervention does not move at least one of these four against the institution's own pre-intervention baseline, it has failed — regardless of how many tools were deployed or how impressive the demo. That discipline is the entire point. It is also why our own essays return, again and again, to measurement: Outcomes, Not Activity is not a tagline we apply to others and exempt ourselves from.

3. The eight principles as the engine of the theory

Our eight principles are not branding. Each is a load-bearing causal commitment, and together they are the engine that makes the chain in Section 5 turn. We state them here as what they are — design constraints on every intervention we run:

Bandwidth, Not Will. Build throughput, not exhortation. Every intervention must add cognitive bandwidth to a decision, or it is theatre.
Intelligence Is Infrastructure. Institutions must build and own their cognitive capability, not rent it by the seat. We bias every deployment toward owned, inspectable, portable systems.
Evidence Before the Vote. No consequential decision should be taken on information no human could read. We wire evidence into the decision before the decision, not after.
From Gatekeeper to Editor. The human role shifts from producing the first draft to editing the machine's — and the institution must be redesigned around that shift, not against it.
The Machine Does the Is, the Human the Ought. Machines establish facts; humans own values. Every system we build keeps the seam between them visible and contestable.
Outcomes, Not Activity. Judge by change caused, not work done. This is the measurement spine of the whole theory of change.
Provenance on Every Claim. Every machine output carries its sources, so a citizen, a regulator, or a judge can trace it. Provenance is the precondition for trust, and trust is the precondition for adoption.
The Whole Stack. We reason about the entire civilization stack — model, data, institution, law, legitimacy — not one layer in isolation. Interventions that fix one layer and ignore the layer above it fail at the seam.

A useful way to read the rest of this document: every outcome we claim, every risk we flag, every metric we commit to, traces back to one of these eight. They are the axioms; the theory of change is the theorem.

4. Our wedge, and why ENSI rather than the market

The obvious objection — steelmanned, as our method demands — is this: the market is already flooded with people selling AI to institutions. Hyperscalers, the Big Four consultancies, a thousand startups, and every national AI strategy. Why does a small European institute add anything?

Because each of those actors has a structural reason not to close the translation gap. Vendors are incentivised to sell capacity, not to make a client independent of them — the rented-cognition trap we name in Intelligence Is Infrastructure. Consultancies are incentivised to retain dependency; a client who learns to do it themselves is a lost account. Public research is incentivised toward novelty and publication, not diffusion; a method that works is, to the academy, a finished paper, not a deployment to scale. And national strategies fund all of the above without owning the connective tissue between them.

ENSI's additionality — the thing that does not happen without us — is to be the neutral diffusion layer across the public and private decision stack. We are independent (no model to sell, no dependency to protect), we operate across sectors (so a method proven in a hospital can travel to a ministry), and our explicit success condition is to make institutions independent of us. We are not a competitor to vendors or consultancies; we are the layer that makes their outputs compound instead of evaporating when the contract ends.

This dictates a wedge and a sequence, not five parallel bets. Our beachhead is a small number of deep decision pilots in two or three institutional settings where the bandwidth crisis is acute and measurable — for instance a public consultation process, a clinical-pathway choice, and a science-funding allocation. Everything else — the curricula, the venture thesis, the policy work, even the public essays — is sequenced behind the pilots, because each depends on evidence the pilots produce. We run pilots first not because they are easy, but because, by evidence over anecdote, nothing downstream is credible without them.

5. The causal chain

Here is the spine of the theory: five "if–then–because" links, each with the assumption that must hold and the signal that would tell us it has failed. A reader who wants to attack this document should attack these five claims.

Link 1 — Deep pilots produce attributable evidence. If we run instrumented decision pilots inside real institutions and measure quality, velocity, throughput, and legitimacy against a pre-intervention baseline, then we obtain credible, attributable evidence of effect — because we compare the same institution before and after, and against a matched control, rather than against a vague market trend. Example: in a consultation pilot, the baseline is "95% of 40,000 responses never reach the decision-maker"; the post-measure is "100% are read, clustered, and surfaced with provenance, and the decision memo cites response-level evidence." That delta is real and traceable. Assumption: partner institutions grant access to real decisions, not sandboxes. Failure signal: partners who will only run toy pilots.

Link 2 — Evidence travels through people and playbooks, not reports. If we codify what worked into decision playbooks and train a cohort to carry them, then the practice diffuses to new institutions — because capability moves with trained people and reusable templates far more reliably than with PDF reports nobody implements. This is why our education work (drawing on what we have studied in Finnish education and in cognitive enhancement with LLMs) is downstream of pilots, not parallel to them: we teach what we have proven, not what we hope. Assumption: trained people stay in, and circulate within, the European institutional market. Failure signal: cohort destination tracking shows brain-drain out of the target institutions.

Link 3 — Diffusion plus governance unlocks the cautious majority. If proven playbooks arrive together with AI-Act-aligned governance templates and provenance built in, then risk-averse public bodies and firms adopt — because the binding blocker for the majority is not capability but defensibility, exactly as Evidence Before the Vote and Provenance on Every Claim predict. A hospital will not deploy a triage aid it cannot audit; give it one whose every recommendation cites the record it rests on, and the objection dissolves. Assumption: the EU AI Act remains the operative compliance regime and our templates track it. Failure signal: material regulatory change we fail to keep pace with.

Link 4 — Owned infrastructure makes adoption durable. If adopting institutions build and own their cognitive capability rather than renting it, then the gains persist and compound instead of evaporating at the next price hike or model deprecation — because, per Intelligence Is Infrastructure, rented capability is structurally fragile. Example: a ministry that owns its consultation-analysis pipeline keeps it across a vendor's price change and an election; one that rents it loses the capability the night the contract lapses. Assumption: owned/open systems remain within a feasible cost and skill envelope for mid-sized institutions. Failure signal: the owned option becomes uneconomic versus rent.

Link 5 — Compounding institutional intelligence shifts the macro outcome. If a critical mass of institutions in a domain raises decision quality, velocity, and legitimacy, then the domain's performance — competitiveness for an industry, trust and efficacy for a democracy, signal-to-noise for a science system — improves measurably, as our domain essays on the European Single Market, engineering democracy, and science DAOs argue. Assumption — the largest one: our contribution is distinguishable from the secular AI wave lifting everyone. We address this attribution problem directly in Section 7; if we cannot solve it, this link is unfalsifiable and we must say so.

Note what the chain does not assume. It does not depend on AGI arriving, on an "abundance society," or on any speculative future. Those belong to a separate horizon (Section 11) that the five-year plan is deliberately independent of. The chain stands on decisions measurable today.

6. Outcomes, activities, and outputs

We distinguish strictly — as Outcomes, Not Activity demands — between activities (what we do), outputs (what we produce), and outcomes (changes in others' behaviour or condition). Most strategy documents quietly relabel outputs as outcomes; we refuse to.

Activities (sequenced behind the wedge)

Run deep decision pilots (the critical path). Select two or three institutional settings; instrument real decisions; deploy owned, provenance-bearing AI systems against a measured baseline. Output: instrumented pilots with before/after decision data and matched controls.
Codify and govern. Turn pilot results into decision playbooks and AI-Act-aligned governance templates with provenance built in. Output: a versioned, openly published asset library.
Write — as a deliberate intervention, not marketing. Our corpus of principles, methods, and domain essays — on AI-driven eGovernment, the technological republic, strong democracy in the AGI age, Software 3.0, agentic startups, and more — is itself a diffusion mechanism. Ideas reframe what decision-makers believe is possible; the writing is the cheapest, highest-leverage way to move the absorptive-capacity and is/ought roots of the problem. Output: a public, citable body of work that shifts the terms of the debate.
Train carriers of the practice. Short, practice-based programmes — co-delivered with business and technical faculties — that teach the proven playbooks. Output: a trained cohort with tracked destinations. This begins only once pilots yield something worth teaching.
Inform policy and investment from evidence. Feed pilot evidence to policymakers and to investors evaluating AI-enabled institutions. Output: evidence-based policy submissions and an investment-screening rubric grounded in real adoption data — not opinion pieces.

Short-term outcomes (Year 1–2)

Demonstrated decision uplift. Across the first pilots, participating units improve at least one of {quality, velocity, throughput, legitimacy} by a margin exceeding their own baseline variance — a real signal, not noise. (Baseline established per unit in the first month of each pilot.)
A transferable asset base. The first decision playbooks and governance templates exist and have been used by at least one institution that did not run the original pilot — the first evidence that practice travels (Link 2).
A trained cohort that stays. A first cohort completes ENSI training and is placed in or retained by European institutions, with destinations tracked.
Demonstrable shift in the debate. Our essays are cited in at least a handful of real institutional or policy documents — the leading indicator that the writing-as-intervention link carries.

Intermediate outcomes (Year 3–5)

Adoption beyond the pilots. A defined population of institutions in our beachhead domains adopts ENSI-originated playbooks. We will not publish a headline percentage until Year-0 measurement gives us a denominator; the target is set in Year 1 against that baseline, and adoption is counted only where it traces to an ENSI asset or person.
Governed adoption in the public sector. Several public bodies use ENSI governance templates to put AI behind a consequential, AI-Act-classified decision with full provenance — the proof that the governance and legitimacy links carry.
Owned, not rented. A meaningful share of adopters run owned or open infrastructure rather than per-seat rentals — the test of Link 4 and of Intelligence Is Infrastructure in practice.
A self-sustaining diffusion loop. Adopting institutions begin generating their own playbooks and training their own people, reducing dependence on ENSI — the success condition that distinguishes us from a consultancy.

7. Indicators — and the attribution problem we refuse to dodge

The hardest honest question for any AI initiative is: how do you know it was you, and not the wave lifting everyone? Most impact reports never ask it. We answer with three commitments, each grounded in a principle.

Counterfactual by design. Wherever feasible, pilot units are compared against matched non-participating units in the same institution or sector, so the measured delta is attributable rather than coincident with the trend. This is Evidence Before the Vote applied to ourselves.
Traceable adoption. An adoption counts toward our outcomes only when it traces to a specific ENSI asset or trained person — Provenance on Every Claim, turned on our own metrics. Diffuse "the sector improved" claims are excluded.
Outcome over output. We report the four decision metrics first. Counts of partnerships, essays published, and tools shipped are inputs to the work, not evidence of impact, and are reported as such — never as headline success. This is Outcomes, Not Activity as an accounting rule, not a slogan.

We deliberately avoid metrics we cannot attribute — national competitiveness rankings, aggregate "trust in government" indices. Movement in those is an aspiration the chain points toward, not a number we will claim credit for. A think tank that takes credit for the weather has abandoned provenance.

8. Assumptions — including the ones most plans hide

The standard assumptions (stakeholder willingness, tool maturity, data availability, supportive policy) all apply, with the mitigations in Section 9. More important are the assumptions large enough to invalidate the whole theory. We name them so they can be watched — steelmanned honesty turned inward.

Access to real decisions. The theory dies if institutions grant only sandboxes. Watch: partners refusing baseline measurement. Response: make instrumented access a precondition of partnership.
People and assets are the diffusion vector. If trained people leave the European market or the practice doesn't travel with them, Link 2 fails. Watch: cohort destination tracking. Response: shift weight toward embedded, in-institution training and toward open assets that diffuse without a person attached.
Our effect is separable from the trend. If we cannot distinguish ENSI's contribution from background AI adoption, our impact claim is unfalsifiable. Response: the counterfactual design in Section 7 is non-negotiable.
The is/ought seam holds. If institutions use our tools to launder value judgements as technical outputs, we will have made decisions worse, not better. Watch: pilots where a contested ought is being smuggled in as an is. Response: the seam from The Machine Does the Is, the Human the Ought is built into every deployment and audited.
The regulatory regime is the EU AI Act as written. Our governance value proposition assumes it stays operative. Watch: material amendment or delay. Response: templates are versioned to track the regulation.
No AGI dependency. We assume nothing about AGI timelines for the five-year plan. If frontier capability stalls, the theory is unaffected; if it accelerates, it is a tailwind, not a load-bearing beam. The speculative case lives in Section 11, quarantined on purpose.

9. Risks and mitigations

Each risk has a trigger we watch, an owner, and a specific response — not a generic reassurance. Several map to dangers we have written about at length.

Pilots show no signal. Trigger: a pilot's post-measure falls inside baseline variance. Response: treat it as information, not embarrassment — publish the null, diagnose whether the gap was absorptive capacity, decision architecture, ownership, or governance, and re-scope before scaling. A theory of change that cannot absorb a null result is propaganda, not science.
Gradual disempowerment. The subtle danger is not a robot uprising but the slow atrophy of human judgement as institutions defer to machines — the problem we dissect in Human–AI Power Dynamics. Response: the gatekeeper-to-editor design keeps humans in the deciding seat as editors of the machine, and we measure whether human judgement is being exercised or quietly surrendered.
Cognitive warfare and capture. An intelligence layer wired into public decisions is a target — for manipulation, poisoning, and influence operations, as our work on cognitive warfare sets out. Response: provenance and ownership are the defence; a system whose inputs are traceable and whose weights are owned is far harder to capture than a rented black box.
Displacement and inequality. AI adoption can concentrate gains and displace workers; pretending otherwise costs credibility. Response: we measure labour effects inside pilots and design playbooks for augmentation over substitution, reporting displacement honestly where it occurs.
Vendor entrenchment. A diffusion layer could accidentally entrench whichever vendor we build on — the exact rented-cognition trap we warn against. Response: deliberate vendor-neutrality; assets must run across at least two model providers and bias toward open weights.
Governance theatre. Templates could become box-ticking that adds cost without safety. Response: every template is validated against a real AI-Act classification in a pilot before release.
Institutional and funding fragility. A small institute is fragile. Response: the sustainability model in Section 10, designed so diffusion revenue reduces grant dependence over time.

10. Inputs and resourcing

A causal chain floats unless it stands on a resource base. The honest version of this plan specifies what it costs to run — the layer most strategy documents omit, and the omission The Whole Stack forbids:

People: a small core team spanning decision science, applied AI engineering, governance and regulation, and partnerships — not a large consultancy bench. The model depends on leverage through partners and published assets, not headcount.
Capital: a blend of catalytic grant funding for the pilot phase and, progressively, earned revenue from the asset library and training as diffusion proves out. The explicit goal is a declining grant share across five years.
Partners: pilot host institutions (a ministry, a hospital trust, a science funder, a mid-sized firm), academic co-delivery faculties, and at least two model/tooling providers for neutrality.
Time: the sequence is deliberate — roughly Year 1 pilots and baselines, Year 2 codification and first diffusion, Years 3–5 scaled adoption and the self-sustaining loop. We resist the temptation to launch all workstreams at once; the wedge comes first.

11. The long-horizon vision (kept separate on purpose)

ENSI's founders hold a longer view, and we state it plainly rather than smuggling it into the five-year plan. As machine intelligence advances toward and perhaps past human capability, the central question stops being competitiveness and becomes constitutional: who owns the intelligence that runs the world, and on whose values does it act? Our essays on defining objectives for AGI, the principles of utopia in the AGI age, and strong democracy for the AGI age argue that a future of broadly shared prosperity is possible — but only if the decision-making infrastructure of civilization is built, owned, and governed democratically rather than rented from a handful of firms. That is the world this institute is ultimately for.

And we ring-fence it. Nothing in the five-year theory of change above depends on that future arriving. The vision sets direction; the plan stands on measured decisions today. Conflating the two is exactly the error that makes most AI strategy documents un-fundable and un-falsifiable — and, by first principles, we decline to make it.

12. Learning and adaptation

The plan is a hypothesis, run as an experiment. Quarterly, we review each causal link against its watch-signal and ask not "are we busy?" but "is the link carrying?" Annually, we publish what worked, what produced a null, and what we changed — including failures, because a diffusion institute earns trust by being the most honest reporter of its own evidence. Where a link fails, we re-scope or retire it rather than restating the original plan with more confident adjectives. This is Evidence Before the Vote applied to our own strategy: we do not get to keep a belief that the data has stopped supporting.

13. Stakeholders

Stakeholders are not a flat list; they differ by power and by interest, and we engage them differently.

Champions (high power, high interest): reform-minded leaders inside pilot host institutions and a small number of policymakers focused on state capacity and competitiveness. They co-design and get evidence first.
Enablers (high power, lower interest): regulators, funders, hyperscalers. We keep them informed and aligned; we do not depend on them to act.
Carriers (lower power, high interest): the trained cohort, faculty, practitioners, and the readers of our essays. They are the diffusion vector and receive the most direct investment.
Beneficiaries (broad): the citizens, patients, students, and SMEs on the receiving end of the decisions we are trying to improve. We reach them through carriers and assets, not direct service — but they, ultimately, are who the legitimacy metric is for.

14. Conclusion

ENSI's theory of change is narrow in method and wide in ambition. Narrow: close the throughput gap between AI capability and real institutional decisions, prove the effect where it can be measured, and move the proven practice across Europe through people, assets, and ideas that compound. Wide: the institutions we are trying to upgrade are the ones a civilization runs on — its governments, its parliaments, its hospitals, its universities, its science. We hold that the binding constraint on all of them was never will, but bandwidth; that intelligence must be owned, not rented; that machines should settle the is and humans the ought; and that an institution should be judged by the change it causes, not the work it performs. This document holds itself to that last principle. It is sequenced behind a wedge, staked on five falsifiable links, honest about attribution, independent of any speculative future, and built to show us where we are wrong. That, more than any vision statement, is what makes it a strategy rather than a brochure.

‍