AI safety in healthcare

In this blog, one of our Senior Data Scientists, Jon Holdship discusses several concerns around AI safety when ML algorithms are used to enhance healthcare. And why they need to be considered from the outset of AI design.

2025-03-10

AI Safety

Health and Care

00:00

--:--

Listen on SoundCloud

Jon Holdship

Senior Data Scientist

AI safety should be a concern for everyone who implements AI, and nowhere more so than in healthcare. Whether by endangering patients or exacerbating inequalities, the potential negative impacts of AI are sharpened in health. However, the benefits are also great so we must be deliberate about safety, implementing systems that we know are improvements over the status quo.

Increasingly when we discuss AI safety, we typically hear about hallucinations and harmful outputs of large language models (LLMs). However, safety is an important concern for all machine learning (ML) applications from patient risk stratification to diagnostic models. In fact, many key aspects of AI safety cross all forms of AI. In this blog, I'll cover several important areas of consideration noting the pitfalls and what you must consider from the earliest stages of design.

Accuracy

Fundamental to the performance of ML algorithms is their accuracy. Whether it’s the misdiagnosis of a patient by a diagnostic AI or incorrect information in a discharge summary written by an LLM, incorrect outputs cause harm. In healthcare, two considerations are vital.

Firstly, AI typically replaces an existing system and that gives a clear benchmark for accuracy. We can judge the safety of an AI inspecting X-rays by asking whether it is at least as accurate as a radiologist. We can’t expect ML systems to be perfect but we can demand they meet existing benchmarks. In fact, clear benchmarks also prevent over-cautiousness. There is a tendency to reject change if the new system is not perfect. But we can use benchmarks to show that an ML model improves on the current system. We may be inclined to reject a triaging AI if we see it put some high-risk people in a low priority category. But if it does so less often than the staff who currently triage patients, isn't it improving patient safety?

Secondly, for high risk applications such as diagnostic AI, the ongoing monitoring of the system is crucial. Changes to machines, infrastructure and personnel greatly impact the accuracy of established models. For example, in one UK screening programme, a simple software upgrade of the mammography equipment caused a breast cancer screening AI, to triple the fraction of patients it believed had cancer. This was despite there being no change to the AI model or in the prevalence of cancer in the patients. Where there is clear risk to patients if accuracy falls, it should be continually monitored to prevent these kinds of changes going unnoticed.

Bias

By bias, we mean an ML model giving different outputs based on the protected characteristics of an individual. This bias can arise from many factors including bias in data collection, social inequality or a real medical difference between demographics.

The source of bias is important to consider when deciding how to address it. We’d expect a model to predict a higher risk of breast cancer in women than men due to the real difference in prevalence between these groups. However, we would want to prevent a model from replicating the socially driven underdiagnosis of heart attacks in women. In the latter case, any training data representative of past diagnosis rates would cause the AI to learn this bias. Employing techniques such as adversarial debiasing which modify the model outputs to remove dependency on variables such as gender need to be applied to mitigate the bias. A complication of debiasing is that it becomes difficult to measure accuracy. If a diagnostic model stops incorrectly underdiagnosing women, it will appear less accurate than a biased model when tested on biased data. But it is simply not incorrectly missing women.

The problem is with the test data, not the model. If the test data itself is biased, it will falsely make the fairer model look worse.

How the model is used is also a consideration. If a model were to predict lower outpatient attendance for a particular ethnicity, it could reflect a systemic social problem. Retaining this view might be beneficial if the model is used to target these patients with additional support, actively addressing that healthcare inequality. However, if the model was used to double-book appointments, we may exacerbate poor health outcomes for these patients and now the bias is harmful.

Explainability

Many ML algorithms, particularly those based on neural networks, have the potential to be black-boxes. Users are left with little to no understanding of how the input to the model produces the output. Even with good accuracy statistics, it’s difficult for users to trust models when they don’t understand the reasoning. Given that one of the biggest defenses against harm from AI comes from human-in-the-loop processes, where AI outputs inform the decisions of an expert, we want to empower that decision-maker. The model needs to explain how the prediction is reached from the input data.

Fortunately, there are many ways to ensure AI predictions are explainable. From using simple, inherently explainable models, to using ML techniques such as Shapley values to extract explanations from a black-box model. ML algorithms must be embedded in systems that provide these explainability tools, informing users which specific parts of the input data have driven the output. This helps uncover bias, flags strange behaviour to experts, and builds trust with users who may otherwise ignore genuinely useful AI.

For example, in our work with the NHS on Covid-19 admissions, we combined data such as Covid testing statistics, past admissions and mobile phone data to predict covid admissions for the coming weeks. By making multiple predictions, including or excluding these sources, we could explain to users how much each source affected our predictions. This allowed expert users to weigh the information we provided using broader contextual knowledge such as the way Covid testing changed over time. More importantly, it built trust in a complex system, making the tool genuinely useful as decision-makers chose to follow its predictions.

Fairness

Related to bias is the concept of fairness. There are many definitions of fairness in the literature, with different fields favouring different ideas. However, given that we have covered inequality of outcome in bias, we focus on inequality of accuracy as our definition of fairness. A model is unfair if it is less accurate for one demographic than another. For example, an X-ray diagnostic model that more accurately identifies fractures in male bodies than in female bodies would be unfair.

Unfairness often arises through insufficient sample sizes for certain demographics or greater weight being put on one demographic than another - both often related to population sizes. From a technical viewpoint, one can assess this by measuring accuracy broken down by demographic and holding the model to the baseline for all. At Faculty, this is standard practice when evaluating models. One key example is our work on the National Covid-19 Chest Imaging Database. We evaluated diagnostic models built by third parties for the NHS. We ensured fairness by performing statistical comparisons of performance between patient sub-groups.

A model that is found to be unfair can often be addressed by replacing it with the correct choice of model. Or using multiple models with each model trained on one group. From a broader viewpoint, an understanding of what data is collected and who is excluded from that is vital. Do the current processes result in a lack of one group’s representation in the data? Whilst measurement of the problem and mitigations are in the domain of technical experts, the broader context of data collection is a systemic issue.

Final thoughts

I’ve listed several concerns around AI safety when ML algorithms are used to enhance healthcare. But rather than preventing us from using AI to improve our healthcare system and patient care, we simply need to be aware of them and deliberately design AI innovation projects to protect against them from the outset.

With the right expertise, each of these problems can be assessed and mitigated to produce safe, effective AI that makes healthcare better for everyone.

If you’d like to learn more about AI safety in healthcare, explore our other articles or get in touch.

View All