Ontology-driven modeling of healthcare data using a graph database
Visualizing large-scale disconnected data in health care is becoming one of the biggest challenges in the big data community. Challenges arise from the heterogeneous and unstructured nature of data as well as complex relationships between different entities. Modeling the data correctly in health care industry helps in answering important questions such as professional health care network analysis for better connectivity and referrals, analyzing sales data for predictive analysis of accounts and sales reps, and providing extensive insights into clinical trials.
These problems can be addressed by integrating disparate, heterogeneous data sources for deriving analytics and meaningful insights from the data. The data integration process explained in this presentation uses an ontology containing vocabularies and taxonomies that are developed specifically for pharmaceutical world. Such ontology helps in understanding and transforming the data by combining relevant information for each entity together to form a centralized repository for entities such as hospitals, health care professionals, patients and others. This creates a 360-view of the profiles that are stored as documents in MongoDB database. These profiles are then interlinked using machine learning algorithms and fuzzy logic. The platform builds an entity-attribute-value model based on this ontology using MongoDB and Neo4j as the backend databases. The striking feature of this ontology-based platform is that it is adaptive, extensible and easy to maintain.
In order to perform complex set operations and predictive analysis over thousands of entities, the platform leverages the use of a graph database by creating an index-heavy entity-based composition model. This model allows us to accommodate unstructured data at a large scale, yet provide a flexible and efficient way to organize the data for big set operations. In this model, along with the entities, we have expressed attributes as first class citizen nodes to understand concrete and complex patterns that connect various entities through attributes. The model also has the flexibility to use materialized views and indexes on nodes and relationships where performance is of the essence. Such a hybrid model in MongoDB and Neo4j has proven to be efficient and performant for data ranging from several Megabytes to Gigabytes and relationships that span across several entities. The pharmaceutical companies visualize the 360-degree view of the entities and understand their connectivity to other entities with the help of the ontology and the graph database. Such a model has allowed the applications to perform near real-time analytics and address complex business questions.