Helping the NHS create the first national COVID-19 database
As the COVID-19 pandemic continues to evolve, it’s becoming increasingly clear that data is one of the most powerful tools at healthcare’s disposal.
As the COVID-19 pandemic continues to evolve, it’s becoming increasingly clear that data is one of the most powerful tools at healthcare’s disposal.
So much is still unknown about the disease, and the need to develop better and faster ways of treating it is so pressing, that there’s little room for failure when it comes to developing new treatments and technologies. Healthcare providers need to know that the technology provided to them has been robustly tested, easy to use, and validated on real patient data.
That’s where the National COVID-19 Chest Imaging Database (or NCCID) comes in: an NHS-controlled database of over 40,000 CT scans, MRIs and X-rays from more than 10,000 patients across the UK during the course of the pandemic.
A joint initiative between NHSX, the British Society of Thoracic Imaging (BSTI), Royal Surrey NHS Foundation Trust and Faculty, the NCCID is designed to enable the development of software that helps doctors and researchers to:
Understand the impact and progression of the disease
Assess the severity of the condition in individual patients
Identify factors that may complicate recovery
Prioritise patients whose condition is most likely to deteriorate
Access to this National COVID-19 Chest Imaging Database (NCCID) is available to hospitals and universities across the country by applying through a rigorous approval process, with applicants using the images to track patterns and markers of illness. The database can speed up understanding of COVID-19 by enabling the study of the virus, leading to a quick treatment plan and greater understanding of whether the patient may end up in a critical condition.
How can data and AI help fight COVID-19?
Chest imaging, specifically X-rays and CT scans, can tell us a lot about a COVID-19 patient – the information from these scans can be used to make diagnoses, decide treatment, estimate the possible progression of the disease, and prioritise certain patients for close monitoring or urgent care.
In the right circumstances, they could also be used to create and test a host of new forms of technology for healthcare providers. But, to ensure that these tools will be just as effective at the patient’s bedside as they are in the lab, technology developers need a representative sample of chest images.
Getting hold of that representative sample isn’t easy: though healthcare providers across the country are collecting chest images rapidly, these data samples are often confined to a specific slice of the population in a specific geographical area.
Technology providers need access to a sample of anonymised patient data that captures two essential things:
The variety of COVID-19 patients – it’s clear that the disease manifests very differently between patients, so it’s vital to get a sample of data that spans ethnicities, patient age and geographic regions.
The scale of data required for computer analysis to understand the full complexity of the disease.
The NCCID is designed to merge data collection efforts across the country, creating a database of chest images that fulfils those criteria. Hospitals are being asked to submit scans going back three years for patients who have tested positive for the virus, and within the last four weeks for a smaller sample of patients who tested negative. Clinical data will also be collected including the age, gender and racial or ethnic origin of the patient.
Faculty's role in the NCCID
When we kicked off our partnership with NHSX to develop its AI Lab last year, we were delighted to be helping the NHS lead the way in supporting the adoption of AI at the healthcare frontline, ultimately to improve patient outcomes. This has become even more important in light of the current COVID-19 pandemic.
With the data hosted on AWS, we’ve developed the warehouse infrastructure for safely storing and accessing the data and implemented bespoke data management processes that allow us to create a separate dataset dedicated to validation, which is never used for model training.
We’re also providing NHSX with access to a secure environment that supports the process of model development and validation. Within one platform, AI models can be built, tested, and have their performance validated on a carefully curated validation dataset.
Balancing accuracy and data privacy
It’s vital that this quest for accuracy and clarity doesn’t compromise the privacy and security of patients; this database is built to help shape treatment well into the future, so we simply can’t sacrifice long-term privacy in the name of shorter-term progress.
Data collection for the project is designed to hold up to rigorous academic and commercial guidelines; the team gathers data using an established process developed by Royal Surrey, which removes any personally identifiable patient data at source before it’s sent to a centralised data warehouse. By assigning a pseudonym to each patient, we can link clinical data with database images while making it impossible to uncover the patient’s true identity. The NHS/CHI number of each patient is also encrypted using an AES encryption algorithm and a complex salt, which allows us to link the data to national datasets without compromising privacy.
Even de-identified or pseudonymised, the data is only available to a select few. Any researchers or developers wishing to access the database must be assessed by a committee of scientific, technology, information governance and ethics advisors to ensure that their use of the data is justified and ethical.
What's next for the NCCID?
According to NHSX, teams from four British university consortiums are already using the NCCID to develop and test tools for the diagnosis, management and prioritisation of COVID-19 patients.
The database already contains 40,000 images, but there’s more work to do: There are currently enough COVID positive images to build a validation set that can ensure AI tools currently in development meet the technical standards required for use in the NHS.
In the words of Dominic Cushnan, Head of AI Imaging at NHSX:
“We are applying the power of artificial intelligence to quickly detect disease patterns and develop new treatments for patients. There is huge potential for patient care, whether through quicker analysis of chest images or better identification of abnormalities.
“The industrial scale collaboration of the NHS, research and innovators on this project alone has demonstrated the huge potential and benefits of technology in transforming care.”
If you’re interested in contributing to or using NCCID data for a software project, visit the NHSX website to find out more.