11 research outputs found

    Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm

    Get PDF
    Background: Electronic health record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of glycated hemoglobin (HbA1c) elevation on the prediction of diabetes onset. However, there is still a need for validation of these models using EHR data collected from different populations. Objective: The aim of this study is to perform a replication study to validate, evaluate, and identify the strengths and weaknesses of replicating a predictive model that employed multiple logistic regression with EHR data to forecast the levels of HbA1c. The original study used data from a population in the United States and this differentiated replication used a population in Saudi Arabia. Methods: A total of 3 models were developed and compared with the model created in the original study. The models were trained and tested using a larger dataset from Saudi Arabia with 36,378 records. The 10-fold cross-validation approach was used for measuring the performance of the models. Results: Applying the method employed in the original study achieved an accuracy of 74% to 75% when using the dataset collected from Saudi Arabia, compared with 77% obtained from using the population from the United States. The results also show a different ranking of importance for the predictors between the original study and the replication. The order of importance for the predictors with our population, from the most to the least importance, is age, random blood sugar, estimated glomerular filtration rate, total cholesterol, non–high-density lipoprotein, and body mass index. Conclusions: This replication study shows that direct use of the models (calculators) created using multiple logistic regression to predict the level of HbA1c may not be appropriate for all populations. This study reveals that the weighting of the predictors needs to be calibrated to the population used. However, the study does confirm that replicating the original study using a different population can help with predicting the levels of HbA1c by using the predictors that are routinely collected and stored in hospital EHR systems

    Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms with Electronic Health Records

    Get PDF
    Background: Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records (EHR) data for identifying such patients can ultimately help provide better health outcomes. Objective: Our study investigates the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also investigate utilizing the patient's EHR longitudinal data in the performance of the predictive models. Explainable methods have been employed to interpret the decisions made by the blackbox models. Methods: This study employed Multiple Logistic Regression, Random Forest, Support Vector Machine and Logistic Regression models, as well as a deep learning model (Multi-layer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large dataset from Saudi Arabia with 18,844 unique patient records. Results: The machine learning models achieved promising results for predicting current HbA1c elevation risk. When employed with longitudinal data, the machine learning models outperformed the Multiple Logistic Regression model employed in the comparative study. The multi-layer perceptron model achieved an accuracy of 83.22% for the AUC-ROC when used with historical data. All models showed close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions: This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (≥5.7% or less). Utilizing the patient's longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies

    Type-2 diabetes mellitus diagnosis from time series clinical data using deep learning models.

    Get PDF
    Clinical data is usually observed and recorded at irregular intervals and includes: evaluations, treatments, vital sign and lab test results. These provide an invaluable source of information to help diagnose and understand medical conditions. In this work, we introduce the largest patient records dataset in diabetes research: King Abdullah International Research Centre Diabetes (KAIMRCD) which includes over 14k patient data. KAIMRCD contains detailed information about the patient’s visit and have been labelled against T2DM by clinicians. The data is processed as time series and then investigated using temporal predictive Deep Learning models with the goal of diagnosing Type 2 Diabetes Mellitus (T2DM). Long Short-Term Memory (LSTM) and Gated-Recurrent Unit (GRU) are trained on KAIMRCD and are demonstrated here to outperform classical machine learning approaches in the literature with over 97% accuracy

    Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet

    Get PDF
    Diabetes is a salient issue and a significant health care concern for many nations. The forecast for the prevalence of diabetes is on the rise. Hence, building a prediction machine learning model to assist in the identification of diabetic patients is of great interest. This study aims to create a machine learning model that is capable of predicting diabetes with high performance. The following study used the BigML platform to train four machine learning algorithms, namely, Deepnet, Models (decision tree), Ensemble and Logistic Regression, on data sets collected from the Ministry of National Guard Hospital Affairs (MNGHA) in Saudi Arabia between the years of 2013 and 2015. The comparative evaluation criteria for the four algorithms examined included; Accuracy, Precision, Recall, F-measure and PhiCoefficient. Results show that the Deepnet algorithm achieved higher performance compared to other machine learning algorithms based on various evaluation matrices

    Identification of VoIP encrypted traffic using a machine learning approach

    Get PDF
    We investigate the performance of three different machine learning algorithms, namely C5.0, AdaBoost and Genetic programming (GP), to generate robust classifiers for identifying VoIP encrypted traffic. To this end, a novel approach (Alshammari and Zincir-Heywood, 2011) based on machine learning is employed to generate robust signatures for classifying VoIP encrypted traffic. We apply statistical calculation on network flows to extract a feature set without including payload information, and information based on the source and destination of ports number and IP addresses. Our results show that finding and employing the most suitable sampling and machine learning technique can improve the performance of classifying VoIP significantly

    Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data

    Get PDF
    Clinical data, such as evaluations, treatments, vital sign and lab test results, are usually observed and recorded in hospital systems. Making use of such data to help physicians to evaluate the mortality risk of in-hospital patients provides an invaluable source of information that can ultimately help with improving healthcare services. In particular, quick and accurate predictions of mortality can be valuable for physicians who are making decisions about interventions. In this work we introduce the use of a predictive Deep Learning model to help evaluate the mortality risk for in-hospital patients. Stacked Denoising Autoencoder (SDA) has been trained using a unique time-stamped dataset (King Abdullah International Research Center - KAIMRC) which is naturally imbalanced. The results are compared to those from common deep learning approaches, using different methods for data balancing. The proposed model demonstrated here aims to overcome the problem of imbalanced data, and outperforms common deep learning approaches with an accuracy of 77.13% for the Recall macro

    Collaborative Denoising Autoencoder for High Glycated Haemoglobin Prediction

    No full text
    A pioneering study is presented demonstrating that the presence of high glycated haemoglobin (HbA1c) levels in a patient’s blood can be reliably predicted from routinely collected clinical data. This paves the way for performing early detection of Type-2 Diabetes Mellitus (T2DM). This will save healthcare providers a major cost associated with the administration and assessment of clinical tests for HbA1c. A novel collaborative denoising autoencoder framework is used to address this challenge. The framework builds an independent denoising autoencoder model for the high and low HbA1c level, which extracts feature representations in the latent space. A baseline model using just three features: patient age together with triglycerides and glucose level achieves 76% F1-score with an SVM classifier. The collaborative denoising autoencoder uses 78 features and can predict HbA1c level with 81% F1-score
    corecore