Healthcare AI

The Pilot That Never Ends

The most common outcome of a healthcare AI pilot is not success or failure. It's another pilot.

By Dr Omer Atli·3 June 2026·10 min read

Walk into almost any hospital with a budget line for innovation and you'll find the same artefact: a tool that has been "in pilot" for eighteen months. It was going to transform triage, or flag the deteriorating patient, or cut the documentation burden. It has a champion, a slide deck, and a small cohort of users who quite like it. What it does not have is a decision. Nobody has switched it off. Nobody has rolled it out. It exists in a permanent grey zone where it can neither prove itself nor disgrace itself — and everyone involved is, quietly, fine with that.

This is the perpetual pilot, and it is the dominant life-form in healthcare AI. Not the spectacular failure, which at least teaches something. Not the clean win, which gets deployed and forgotten. The pilot that simply continues — accruing logos, conference slides, and "deployed in N hospitals" arithmetic — while answering no question at all. A pilot that can't fail isn't evidence. It's theatre with a start date.

Anatomy of the perpetual pilot

You can recognise one before anyone tells you it's struggling, because the shape is always the same.

It runs on a small, friendly cohort. One enthusiastic ward, a handful of clinicians who volunteered because they're the sort who volunteer, a champion who genuinely believes. This is sensible on day one and corrosive by day ninety, because the cohort was selected for receptiveness, not representativeness. The tool is being tried by exactly the people most inclined to make it work and least likely to break it. Whatever the pilot learns, it learns from the best possible conditions — which is the one place a real deployment will never live.

The success criteria are vague, or worse, defined after the fact. Ask at the start what "working" would mean and you'll often get a number nobody committed to, or a feeling — "better clinician experience", "improved flow". When the data comes in ambiguous, as it almost always does, the criteria quietly bend to fit it. A modest signal becomes "promising early results". A null result becomes "needs a longer evaluation window". The goalposts were never planted, so they can't be moved; they just drift to wherever the ball happened to land.

And it ends — if it ends at all — with a press release rather than a decision. "Trust trials AI tool to support frontline staff." Note the verb. Trials, present continuous, the tense of things that never resolve. The output of the pilot is the announcement of the pilot. The work product is the existence of the work. At no point does a named human stand up and say: this is good enough to deploy at scale, or it isn't, and here is which, and here is why.

That absence is not an oversight. It is the entire point of the structure.

Why everyone is paid to keep it alive

The perpetual pilot persists because it is the equilibrium outcome of everyone's incentives. Not a conspiracy — an alignment. Every party to it gets exactly what they came for the moment the pilot starts, and nothing further they need from it ending.

The vendor gets a logo and a sentence. "Live in fourteen trusts" is a fundraising asset whether or not a single one has committed to buy, because "live" is doing enormous, undefined work. A pilot is a customer-shaped object that costs the vendor almost nothing and converts directly into the next round's deck. Conversion to a real contract would be better revenue — but it also introduces procurement, scrutiny, and the risk of a no. A warm pilot in hand is worth more than a cold decision in the bush.

The provider gets innovation visibility without operational risk. The organisation can be seen to be forward-looking — running AI, partnering with the sector, appearing on the panels — while carrying none of the liability that real deployment brings. Nothing has been integrated into the critical path. Nothing has to be supported at 3am. If the tool is quietly mediocre, no one has signed anything that says so. Innovation theatre is cheap precisely because the curtain never goes up on the second act.

The clinical champion gets a publication and a platform. A pilot is a poster, a conference talk, a line on the CV that says digital leadership. The reward for the champion is the doing, which is fully banked at launch. The reward for a deployment decision — especially a negative one — is mostly grief: the awkwardness of telling a partner it didn't work, the colleagues who'll say they told you so.

And the thread running through all three: nobody's job depends on the conversion decision, so nobody makes it. There is no role called "the person who decides whether this becomes real." The pilot is owned by everyone at the start and no one at the end. To deploy is to take on accountability. To stop is to admit a sunk cost. To continue is to do neither — and continuing is free. Of course it persists. We built a system in which the rational move, for every single actor, is to never finish.

The cost of not deciding

The comfortable assumption is that an endless pilot is harmless — a tool sitting benignly in the background, helping a few people, hurting no one. That assumption is wrong on three counts.

First, pilots are not free, because the resource they consume is clinical. A pilot runs on staff time and attention, and in a hospital those are not slack commodities lying around waiting to be used. They are the same finite supply that patient care runs on. Every clinician learning an interface, attending the feedback session, working around the tool's rough edges, or double-checking its output because they don't quite trust it yet, is spending attention that had somewhere else to be. A pilot that delivers no decision has still drawn down the one budget a clinical environment can least afford to spend on nothing.

Second, the unfailable pilot poisons the well for the next one. Clinicians are not a renewable enthusiasm. Run enough pilots that start with a town hall and end with a shrug, and the workforce learns the lesson you didn't mean to teach: this is what AI is — something that appears, mildly inconveniences us, and evaporates without consequence. So when a genuinely good tool arrives, it meets a workforce that has been trained, pilot by abandoned pilot, to give it nothing. Pilot fatigue is a real cost, and it is paid forward, by the better product that had the misfortune to come later.

Third — and this is the one that should sting — the pilot that can't fail generates no learning, which makes it worse than a clean negative result. A pilot honestly designed to be killable, and then killed, tells you something true: this didn't work here, under these conditions, for these reasons. That is expensive knowledge, and it compounds. The perpetual pilot, by contrast, is engineered to never produce that sentence. It yields no finding, confirms no hypothesis, closes no question. A null result you can publish is an asset. A null result you've structured yourself never to reach is a liability wearing the costume of progress.

What a decidable pilot looks like

The fix is not to run fewer pilots. It's to run pilots that are allowed to die. The difference between evaluation and theatre comes down to a handful of features, all of which are about committing to a verdict before you know what the verdict will be.

A decidable pilot has its success criteria pre-registered, and a kill condition stated in the same breath. Before the first patient, in writing: this is what good looks like, this is the threshold, and this is the result that means we stop. The kill condition matters more than the success target, because it is the part the perpetual pilot is built to never have. A pilot you've pre-committed to stop under defined conditions is one you're no longer free to drift indefinitely.

It has a named owner of the deploy-or-stop decision, and a date by which they decide. Not a committee, not "the programme", not a steering group that meets quarterly and minutes the question to the next meeting — a person, with a deadline, whose explicit job is to read the evidence and call it. Accountability that belongs to everyone belongs to no one. Put a name and a date on the decision and you have converted an open-ended exercise into something that must, eventually, resolve.

It tests under deployment conditions, not best-ward conditions. The pilot lives where the tool would actually have to survive — the ordinary ward on the ordinary night, the average user rather than the champion, the integration as it would really sit in the workflow rather than the demo's clean rails. A tool that only works in the conditions a pilot can curate is a tool that doesn't work, and the honest pilot is the one designed to find that out rather than to flatter past it.

And when the result is negative, it publishes the negative result. This is the cultural keystone, and the rarest. The willingness to say, in public, "we evaluated this properly and it didn't earn deployment" is what separates a sector that learns from one that performs. Negative results are how a field stops paying, over and over, for the same disappointment dressed in next year's branding. They are also, not incidentally, the most useful thing a pilot can produce.

What this means

There's a tidy symmetry at the heart of this, and it's worth saying plainly: if a pilot can't fail, it can't succeed either. It can only continue. The two outcomes are bought with the same currency — a willingness to be wrong, in writing, in advance, in public — and a structure that refuses to risk the first has quietly forfeited the second. The perpetual pilot looks like caution and is actually its opposite: an elaborate, expensive mechanism for never having to find out.

So the question to ask of any healthcare AI pilot is not whether it's going well. The interesting ones always look like they're going well; that's what they're for. The question is the one the structure is built to dodge: what result would end this, who decides, and by when? If the honest answer is nothing, no one, never, then it was never an evaluation. It was a logo with a launch date — and the field has enough of those already.

Key Takeaways

The perpetual pilot — small friendly cohort, vague or retrofitted success criteria, ending in a press release rather than a decision — is the equilibrium outcome of everyone's incentives in healthcare AI, not anyone's mistake.
Vendors get a logo, providers get innovation visibility without operational risk, champions get a publication — and because no one's job depends on the conversion decision, no one makes it. Continuing is the only move that's free.
Pilots aren't harmless: they spend clinical staff time and attention, the same finite budget patient care runs on, in exchange for no verdict.
Pilot fatigue is paid forward — every abandoned pilot trains the workforce to give the next, possibly better, tool nothing.
A pilot without a pre-registered kill condition, a named decision owner, and a date is marketing, not evaluation. If it can't fail, it can't succeed — it can only continue.

This website is for educational, editorial, and professional purposes only. It does not provide medical consultations, diagnosis, treatment, prescribing, or personal medical advice. The content reflects the author's commentary and opinions on clinical, scientific, and healthcare-industry topics, and is not a substitute for individual care from a qualified healthcare provider. If you have a clinical concern, please consult your own GP or other healthcare professional.

Dr Omer Atli

Physician · Healthcare AI · Emergency & Primary Care

Related writing

All writing →

Healthcare AI

AI Scribes Are Not the Endgame

AI scribes solve a real documentation problem. But calling them co-pilots confuses transcription with clinical reasoning — and the gap matters.

→10 min

Healthcare AI

Automation Bias Has a Bedside: When the Failure Mode of Clinical AI Is the Human Who Trusts It

The dangerous failure of clinical AI is rarely the model being wrong — it's the clinician agreeing with it anyway.

→10 min

Healthcare AI

Shadow AI Is Already in the Hospital — and No Risk Register Knows Its Name

The most widely used clinical AI in any hospital today was never procured, never assessed, and never appears on a single risk log. It is in the staff's pockets.

→9 min