AI Safety

Clinician-led safety red-teaming for healthcare AI

Patient-facing AI rarely fails by not knowing the diagnosis. It fails after recognition — a missing instruction, a softened disposition, a dropped harm-prevention step, the wrong country's emergency pathway. These failures don't show up on benchmarks. They show up in front of a clinician running a locked protocol.

I'm an emergency physician (GMC-registered). Emergency medicine is professional uncertainty management under time pressure — which is exactly what safety evaluation of patient-facing AI requires. My public audit of three frontier models (read it here) found the patterns; the same method, run against your product, finds yours before your enterprise buyers, investors or regulators do.

How it works

Scenario design tuned to your product and population → locked grading protocol (pre-registered failure taxonomy F1–F8, severity scale S0–S3, three clinical dimensions) → runs with variance testing → severity-graded findings with verbatim receipts → a prioritised risk-reduction backlog your team can ship: escalation wording, mandatory next-step instructions, location handling, dispositional guardrails.

Three formats

Pulse check 10 scenarios on your highest-risk flow · repeated-run variance on the worst finding · short written report + 45-min call · 1 week. The easy yes: find out what a clinician sees in your product in seven days.

Focused red-team 30–50 scenarios tailored to your use case and populations (elderly, pregnancy, polypharmacy, mental health) · full taxonomy + severity grading · verbatim receipts · remediation backlog · readout call with your product/eng team · 2–3 weeks. The standard engagement: a defensible answer to "how do you know it's safe?"

Pre-enterprise / deep red-team 100+ scenarios · special populations and medication traps · adversarial phrasings · before/after re-test following your fixes · board/investor-ready report aligned to your risk register and regulatory pathway · scoped per product. For teams heading into hospital pilots, enterprise procurement, or regulatory conversations.

Pricing is scoped per engagement, not listed — email me to talk through scope and pilot pricing.

What this is not

Not a regulatory certification, not a clinical validation study, and not a substitute for your quality-management process — it's the structured clinical risk-discovery layer that makes all three of those conversations easier. Device-classification and formal regulatory strategy: I'll tell you when you need it and won't pretend to be it.

Book a 20-minute scoping call →Read the public audit →