There are more than 3 million medical concepts, 20 million relationships and 100 million concept properties. Thus, it can be very challenging to explore the relationships between terminologies.

We built a graph database to allow technical and non-technical users to explore the complex relationships that were previously hidden in the data.

Customer

An international technology services company in the healthcare sector.


Problem

The world of healthcare faces challenges owing to the competing standards used to talk about symptoms, drugs and diseases. The Observational Medical Outcomes Partnership (OMOP) has made some headway towards managing this complexity by building a database of synonyms to bridge different terminologies. The standard vocabulary consists of more than 3 million medical concepts, 20 million relationships and 100 million concept properties. These are stored in a spreadsheet-like relational database, which makes it very challenging to explore the relationships between terminologies.

To make sense of the information, the user must have significant technical knowledge. Faculty was asked if we could develop a computational method to improve the process of bridging terminologies.


Solution

Our solution was to use a neo4j graph database to represent each concept as a node, and the relationships between them as edges. We built a web application that helped users to search for concepts or links, which then allowed rapid exploration of their relationships. This was a vast improvement on the standard relational database.

To further improve the ability to navigate the data, we devised a novel ranking algorithm inspired by Google Search’s PageRank. By classifying nodes by relevance, we saved analysts significant time homing into the correct ontology.

The graph database allows graphic visualisation, enabling a user to more easily navigate and explore complex relationships in the data.


Impact

Representing the OMOP data as graph databases opened up new possibilities for both our client and their clients. The tool helped both technical users and non-technical users (who may have specialist knowledge in the medical domain) to explore the complex relationships that were once hidden in the data. It also provided an excellent platform for further machine learning to be performed on the data.