5 research outputs found

    Data Mining and Machine Learning to Predict Acute Coronary Syndrome Mortality

    Get PDF
    This thesis has investigated and demonstrated the potential for developing prediction models using Machine Learning(ML) algorithms on registry datasets. Many current Acute Coronary Syndrome (ACS) prediction models, were developed using traditional statistical methods. In an era of big-data evolution, ML offers a spectrum of algorithms that aid in generating prediction models for ACS. This study has explored 29 algorithms with which to build ACS prediction models for Asian (Malaysia) and Western (Leeds, UK) registries, covering patients with all types of ACS and those with the new standard ACS treatments. The internal and external validation of the models present satisfactory calibration measures, indicating the ability of ML algorithms to produce competitive models in comparison to traditional statistical methods. To achieve simpler, yet competitive predictive performance, comprehensive ML feature selection methods have been evaluated, and Correlation-Based-Feature-Selection(CFS) emerged as the best method. This thesis also has evaluated the potential of predictors of existing ACS models to be adapted to other registries‘ data. Despite different regions and different population characteristics, most of the existing predictors remains constant with the outcome. Thus, the findings suggest that, with some adjustments customized to the registry, the existing predictors can be adopted to develop a simple model and expedite the model development process. Furthermore, the strength of the predictors of each clinical categories has also been evaluated. The results suggest that, to construct a satisfactory ACS model, combination of predictors from various clinical events is essential. At the very least, to achieve a satisfactory model, combination of demographic, medical history, and clinical presentation information categories is required. However, predictors from medication history category has found to be worthless in terms of contributing to a better prediction model. Next, this study has investigated classifier degradation in ML model development. The findings suggest that the overlapping instances in minority class of imbalanced dataset and missing values are the main problems of classifier degradation. New methods i.e. the overlapped-undersampling method to handle imbalanced dataset and the mean-clustering-imputation method to handle missing values have been introduced. The overlapped-undersampling failed to boost the model performance of the datasets. Nevertheless, the results suggest that more training samples on imbalanced datasets are sufficient to produce satisfactory models. The mean-clustering-imputation method produced better models compare to the simple imputation method and imputation method embedded in an algorithm. However, removing instances with missing data resulted in superior models
    corecore