Search CORE

11 research outputs found

Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm

Author: Alhassan Zakhriya
Alshammari Riyad
Budgen David
Moubayed Noura Al
Publication venue: JMIR Publications
Publication date: 01/07/2020
Field of study

Background: Electronic health record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of glycated hemoglobin (HbA1c) elevation on the prediction of diabetes onset. However, there is still a need for validation of these models using EHR data collected from different populations. Objective: The aim of this study is to perform a replication study to validate, evaluate, and identify the strengths and weaknesses of replicating a predictive model that employed multiple logistic regression with EHR data to forecast the levels of HbA1c. The original study used data from a population in the United States and this differentiated replication used a population in Saudi Arabia. Methods: A total of 3 models were developed and compared with the model created in the original study. The models were trained and tested using a larger dataset from Saudi Arabia with 36,378 records. The 10-fold cross-validation approach was used for measuring the performance of the models. Results: Applying the method employed in the original study achieved an accuracy of 74% to 75% when using the dataset collected from Saudi Arabia, compared with 77% obtained from using the population from the United States. The results also show a different ranking of importance for the predictors between the original study and the replication. The order of importance for the predictors with our population, from the most to the least importance, is age, random blood sugar, estimated glomerular filtration rate, total cholesterol, non–high-density lipoprotein, and body mass index. Conclusions: This replication study shows that direct use of the models (calculators) created using multiple logistic regression to predict the level of HbA1c may not be appropriate for all populations. This study reveals that the weighting of the predictors needs to be calibrated to the population used. However, the study does confirm that replicating the original study using a different population can help with predicting the levels of HbA1c by using the predictors that are routinely collected and stored in hospital EHR systems

Durham Research Online

Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms with Electronic Health Records

Author: Al Moubayed Noura
Alessa Ali
Alhassan Zakhriya
Alshammari Riyad
Budgen David
Watson Matthew
Publication venue: JMIR Publications
Publication date: 22/04/2021
Field of study

Background: Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records (EHR) data for identifying such patients can ultimately help provide better health outcomes. Objective: Our study investigates the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also investigate utilizing the patient's EHR longitudinal data in the performance of the predictive models. Explainable methods have been employed to interpret the decisions made by the blackbox models. Methods: This study employed Multiple Logistic Regression, Random Forest, Support Vector Machine and Logistic Regression models, as well as a deep learning model (Multi-layer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large dataset from Saudi Arabia with 18,844 unique patient records. Results: The machine learning models achieved promising results for predicting current HbA1c elevation risk. When employed with longitudinal data, the machine learning models outperformed the Multiple Logistic Regression model employed in the comparative study. The multi-layer perceptron model achieved an accuracy of 83.22% for the AUC-ROC when used with historical data. All models showed close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions: This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (≥5.7% or less). Utilizing the patient's longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies

Durham Research Online

Type-2 diabetes mellitus diagnosis from time series clinical data using deep learning models.

Author: Al Moubayed Noura
Alhassan Zakhriya
Alshammari Riyad
Budgen David
Daghstani Tahini
Hammer Barbara
Iliadis Lazaros
Kůrková Věra
Maglogiannis Ilias
Manolopoulos Yannis
McGough Stephen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Clinical data is usually observed and recorded at irregular intervals and includes: evaluations, treatments, vital sign and lab test results. These provide an invaluable source of information to help diagnose and understand medical conditions. In this work, we introduce the largest patient records dataset in diabetes research: King Abdullah International Research Centre Diabetes (KAIMRCD) which includes over 14k patient data. KAIMRCD contains detailed information about the patient’s visit and have been labelled against T2DM by clinicians. The data is processed as time series and then investigated using temporal predictive Deep Learning models with the goal of diagnosing Type 2 Diabetes Mellitus (T2DM). Long Short-Term Memory (LSTM) and Gated-Recurrent Unit (GRU) are trained on KAIMRCD and are demonstrated here to outperform classical machine learning approaches in the literature with over 97% accuracy

Durham Research Online

Crossref

Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet

Author: Alshammari Abdulwahhab
Alshammari Riyad
Atiyah Noorah
Daghistani Tahani
Publication venue: University of Illinois at Chicago Library
Publication date: 24/07/2020
Field of study

Diabetes is a salient issue and a significant health care concern for many nations. The forecast for the prevalence of diabetes is on the rise. Hence, building a prediction machine learning model to assist in the identification of diabetic patients is of great interest. This study aims to create a machine learning model that is capable of predicting diabetes with high performance. The following study used the BigML platform to train four machine learning algorithms, namely, Deepnet, Models (decision tree), Ensemble and Logistic Regression, on data sets collected from the Ministry of National Guard Hospital Affairs (MNGHA) in Saudi Arabia between the years of 2013 and 2015. The comparative evaluation criteria for the four algorithms examined included; Accuracy, Precision, Recall, F-measure and PhiCoefficient. Results show that the Deepnet algorithm achieved higher performance compared to other machine learning algorithms based on various evaluation matrices

University of Illinois at Chicago: Journals@UIC

Identification of VoIP encrypted traffic using a machine learning approach

Author: A. Nur Zincir-Heywood
Riyad Alshammari
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

We investigate the performance of three different machine learning algorithms, namely C5.0, AdaBoost and Genetic programming (GP), to generate robust classifiers for identifying VoIP encrypted traffic. To this end, a novel approach (Alshammari and Zincir-Heywood, 2011) based on machine learning is employed to generate robust signatures for classifying VoIP encrypted traffic. We apply statistical calculation on network flows to extract a feature set without including payload information, and information based on the source and destination of ports number and IP addresses. Our results show that finding and employing the most suitable sampling and machine learning technique can improve the performance of classifying VoIP significantly

Elsevier - Publisher Connector

Directory of Open Access Journals

Transactions on large-scale data- and knowledge-centered systems XXXV

Author: Hameurlain Abdelkader
Küng Josef
Razzak Imran
Riyad Alshammari
Sakr Sherif
Wagner Roland
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2017
Field of study

CERN Document Server

Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data

Author: Al Moubayed Noura
Alhassan Zakhriya
Alshammari Riyad
Budgen David
Daghstani Tahani
McGough A. Stephen
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2018
Field of study

Clinical data, such as evaluations, treatments, vital sign and lab test results, are usually observed and recorded in hospital systems. Making use of such data to help physicians to evaluate the mortality risk of in-hospital patients provides an invaluable source of information that can ultimately help with improving healthcare services. In particular, quick and accurate predictions of mortality can be valuable for physicians who are making decisions about interventions. In this work we introduce the use of a predictive Deep Learning model to help evaluate the mortality risk for in-hospital patients. Stacked Denoising Autoencoder (SDA) has been trained using a unique time-stamped dataset (King Abdullah International Research Center - KAIMRC) which is naturally imbalanced. The results are compared to those from common deep learning approaches, using different methods for data balancing. The proposed model demonstrated here aims to overcome the problem of imbalanced data, and outperforms common deep learning approaches with an accuracy of 77.13% for the Recall macro

Durham Research Online

Crossref

Collaborative Denoising Autoencoder for High Glycated Haemoglobin Prediction

Author: Al Moubayed Noura
Alessa Ali
Alhassan Zakhriya
Alshammari Riyad
Budgen David
Daghstani Tahini
Karpov Pavel
Kůrková Věra
Tetko Igor V.
Theis Fabian
Publication venue: Springer Verlag
Publication date: 01/01/2019
Field of study

A pioneering study is presented demonstrating that the presence of high glycated haemoglobin (HbA1c) levels in a patient’s blood can be reliably predicted from routinely collected clinical data. This paves the way for performing early detection of Type-2 Diabetes Mellitus (T2DM). This will save healthcare providers a major cost associated with the administration and assessment of clinical tests for HbA1c. A novel collaborative denoising autoencoder framework is used to address this challenge. The framework builds an independent denoising autoencoder model for the high and low HbA1c level, which extracts feature representations in the latent space. A baseline model using just three features: patient age together with triglycerides and glucose level achieves 76% F1-score with an SVM classifier. The collaborative denoising autoencoder uses 78 features and can predict HbA1c level with 81% F1-score

Durham Research Online