The Surrogate Endpoint Problem: When the Marker Improves and the Patient Doesn't
A trial can move the number it measures and miss the thing it was meant to fix entirely — and the history of medicine is built on the rubble of confusing the two.
There was a stretch, decades ago, when cardiology had a tidy theory and the drugs to act on it. Patients who survived a heart attack often threw extra, irregular beats, and those irregular beats were associated with sudden death. The logic wrote itself: suppress the beats, prevent the deaths. Drugs existed that flattened the irregularity beautifully — the monitors went quiet, the traces tidied up, the marker did exactly what it was asked to do. So they were given, widely, on the strength of that marker, to people who felt fine.
When someone finally ran the trial that counted the bodies rather than the beats, the result was the opposite of the theory. The patients on the drugs were dying more, not less. The ectopics had been suppressed and the people had not been saved — because the drugs that quietened the rhythm also, by a separate mechanism, stopped some hearts altogether. The marker had been a side road. The theory had mistaken it for the highway. By the time the highway was mapped, the cost had been counted in lives, and the lesson was permanent: a number that stands in for survival is not survival, and you do not get to assume the difference away.
That is the surrogate endpoint problem, and it has not gone anywhere. If anything, the pressures that produce it have intensified.
Why surrogates exist — and why the good ones are not a cheat
Start with the honest case, because there is one. The outcomes that actually matter to patients — living longer, living better, not having the stroke, not losing the function — are slow, rare, and ruinously expensive to measure. To show that a drug prevents death you may need thousands of people followed for years, because death, mercifully, is uncommon on any given Tuesday. To show that the same drug lowers a blood marker, you need fewer people, less time, and a machine that already sits in every lab. The surrogate is faster, cheaper, smaller, and answers sooner. None of that is corruption. It is arithmetic.
And sometimes the stand-in is genuinely earned. A surrogate deserves trust when there is a real, established causal chain running through it — when moving the marker reliably moves the outcome because the marker sits on the road to the outcome, not beside it. Lower the pressure, prevent the strokes the pressure was causing. There are cases where that chain has been mapped so thoroughly, across so many interventions, that the marker has become a legitimate proxy: shift it and the benefit follows, demonstrated enough times that the extrapolation is no longer a leap. That is the validated surrogate, and it is one of the more useful instruments in evidence-based medicine. It lets good drugs reach patients years before a mortality trial could ever return its verdict.
The regulators lean on this, deliberately. Accelerated pathways exist precisely to let a treatment through on the strength of a surrogate, on the bet that the patient benefit will follow and on the promise that someone will confirm it afterwards. For a fatal disease with nothing else on the shelf, that is a defensible trade — speed now, confirmation later. The trouble is everything packed into the word "later", and everything assumed in the word "follow". The bet is sometimes lost. The confirmation is sometimes never delivered, or arrives to say the bet was lost. The surrogate buys time, and time is occasionally all it buys.
How a surrogate betrays you
The failure has a shape, and once you have seen the shape you start spotting it everywhere.
A surrogate fails when the intervention moves it by a path that bypasses the outcome. The marker is real, the disease is real, the association between them is real — and the drug reaches the marker through a back route that never touches the thing the patient cares about. The anti-arrhythmics quieted the rhythm and stopped hearts by a different door. The whole tragedy lives in the word and: the drug did the good thing it was measured on and a bad thing it was not, and the surrogate, by construction, could only ever report the first.
It fails, too, when the marker was never on the causal road to begin with — merely travelling alongside it. Plenty of markers rise and fall with a disease without driving it. They are passengers, not the engine. Push a passenger around and the engine does not care. A drug can drag such a marker in the flattering direction and leave the underlying process entirely untouched, which is how you get treatments that make the bloodwork look like recovery while the patient continues, quietly, in the wrong direction.
And it fails most expensively when the surrogate captures only one ledger. A treatment has a full balance sheet — benefits and harms, the intended effect and everything else it does to a body. A surrogate, almost by definition, watches a single line of that sheet. It can show the intended benefit landing perfectly while net harm accumulates off-screen, in side effects and second mechanisms the marker was never built to see. A drug can win on its surrogate and lose on the patient, and the surrogate will report the win with a straight face, because losing was never in its field of view. The history of medicine is not short of confident occasions when exactly this happened — the marker moved, the milestone was declared, and the outcome, when someone finally measured it, declined to cooperate.
The reader's test
You cannot re-run the trials, so the defence is a habit of reading. Three questions do most of the work.
First: has this surrogate earned its trust for this intervention class, or is it borrowing credibility from elsewhere? Surrogacy is not a general license. A marker that has proved itself a faithful proxy for one mechanism, one drug class, one disease, has earned nothing whatsoever for a different drug that reaches it by a different route. The validation is specific or it is nothing. When a paper leans on a surrogate, the question is not "is this a respectable marker in general?" but "has moving this marker with this kind of intervention actually predicted patient benefit before?" Often the honest answer is no — and the surrogate is wearing a reputation it borrowed from a neighbour.
Second: is the causal chain argued, or merely assumed? A careful paper shows its working — here is the marker, here is the mechanism by which moving it should move the outcome, here is why we believe this drug travels that road and not a back route. A paper hoping you will not notice simply gestures at the association and proceeds as though the link were settled. The association is never the argument. That a marker tracks a disease tells you it might be a passenger; it does not tell you it is the engine, and the difference is the entire question.
Third — and this is the tell to keep closest — watch the language for outcome-flavoured words describing marker-only results. This is where spin does its quietest work. A study moves a biomarker and the prose starts murmuring about "benefit", "response", "improvement", "protection" — words that smuggle in the patient outcome the study never measured. A marker went down. That is the finding. Everything past it — every implication that a life was lengthened or a stroke averted — is extrapolation in the costume of a result. The slippage from "the marker improved" to "the patient improved" usually happens in a single unguarded sentence, and that sentence is doing more work than the entire methods section beneath it.
The wider pattern
None of this stays in the clinic, because the surrogate endpoint problem is one specific clinical face of a law that governs measurement everywhere. When you cannot watch the thing you care about, you watch a proxy — and the moment the proxy becomes the target, it starts to come apart from the thing it stood for.
Education does it: the test score stands in for learning, so schools optimise the score, and the score drifts free of the learning it was meant to certify. Technology does it: engagement stands in for value, so products farm the engagement, and a metric meant to signal that people were served comes instead to signal that they were captured. The AI field does it relentlessly — a benchmark stands in for capability, the models are tuned until the benchmark is saturated, and the number climbs while the underlying competence it was supposed to track quietly fails to keep pace. The same hollowing-out shows up in clinical AI evaluation, where a tidy accuracy figure on a curated test set stands in for usefulness in the actual mess of care, and the figure can gleam while the usefulness does not arrive. The pattern is identical to the marker that moves while the patient does not.
That general form has a name — Goodhart's law, the principle that a measure used as a target stops being a good measure. The surrogate endpoint is what Goodhart's law looks like when the stakes are mortality and the target is something you can put in a syringe. Recognise it in one place and you have a lens for all of them: the proxy is not the thing, and pressure on the proxy is exactly the force that pries them apart.
What this means
A number that stands in for the thing is not the thing. That is the whole of it, and it sounds too obvious to need saying, right up until a quiet monitor and a tidy lab value persuade everyone that something good has happened to a person when the only demonstrated event is that a marker moved. Surrogates are not the enemy — modern medicine would crawl without them, and the validated ones genuinely deliver good treatments to patients sooner than any other instrument could. But the surrogate is always a bet that the stand-in resembles the thing it stands for, and that bet has to be placed consciously, with eyes open, every single time. The disasters do not come from using surrogates. They come from forgetting that you did — from letting the marker move and walking away as though the patient had been saved, when all anyone actually showed was that the number went down.
Key Takeaways
- A surrogate endpoint measures a marker, not the outcome the patient lives or dies by — and the gap between the two is exactly where harm hides.
- Medicine's own history records treatments that improved the surrogate beautifully and worsened survival, because the drug reached the marker by a route that bypassed — or actively damaged — the patient.
- A surrogate earns trust per intervention class, never in general: validation for one mechanism transfers nothing to a different drug that moves the same marker by a different path.
- The tell to watch for is outcome-flavoured language — "benefit", "response", "protection" — describing a result that only ever measured a marker; the slip from "the number improved" to "the patient improved" usually hides in one sentence.
- The surrogate endpoint problem is Goodhart's law with mortality at stake: the proxy is not the thing, and using it as the target is the very force that pulls them apart — in medicine, in education, in AI benchmarks alike.
This website is for educational, editorial, and professional purposes only. It does not provide medical consultations, diagnosis, treatment, prescribing, or personal medical advice. The content reflects the author's commentary and opinions on clinical, scientific, and healthcare-industry topics, and is not a substitute for individual care from a qualified healthcare provider. If you have a clinical concern, please consult your own GP or other healthcare professional.
Physician · Healthcare AI · Emergency & Primary Care
Related writing
Screening Is Not Always a Gift: The Arithmetic That Flatters Early Detection
"Early detection saves lives" is the most intuitive sentence in medicine — and one of the easiest to prove without proving anything at all.
Write Medicine in Plain English: Keep the Precision, Lose the Priesthood
Plain English in medicine is not simpler writing — it is the writer doing the work so the reader doesn't have to.
The Case Report Is Medicine's Smoke Alarm — Stop Ranking It Against the Census
It sits at the bottom of the evidence pyramid because the pyramid measures the wrong thing.