An Interactive Health Data Science Platform for Exploratory Analysis of Health Outcomes – a Case Study with Colon Cancer

Abstract

Disease prediction is an important aspect of early disease detection and preventive care with wide range of applications in healthcare domain. Previous studies used image processing techniques, statistical and machine learning models to predict diseases. Prediction accuracies vary with data type and the target. Often the data is processed through models under different data conditions to identify what works best for a scenario. This results in tweaking the code, running multiple iterations making these methods usable only for people with technical skills. An interactive platform is developed that hides the technicalities and allows the users to change options like target disease for prognosis, feature selection method, sample size, ML algorithm. With this, multiple approaches can be tried and compared to find a combination of the options for an efficient outcome. Colon cancer is used to perform a case study to test this platform. 2 selection algorithms and 3 ML models are used. Although both selection methods identified identical features as significant for colon cancer prediction, the order of the features based on the scores is different. Hence, the machine learning algorithms performed similarly with both the selection methods. Random Forest, Logistic Regression, and Decision Tree had accuracies 87%, 86%, and 83% respectively

    Similar works