Search CORE

15,922 research outputs found

A Survey on Using Machine Learning to Predict Diabetes Early on

Author: Deepak Motwani Satyendra Singh Rawat, Amit Kumar Mishra,
Publication venue: Auricle Global Society of Education and Research
Publication date: 25/04/2024
Field of study

Diabetes is a category of metabolic disease caused by a prolonged high blood sugar level. It is sometimes referred to as a chronic disease. If accurate early prediction is achievable, it can considerably lower the risk factor and severity of diabetes. Combining data mining methods with machine learning, a subsection of artificial intelligence, offers promise in the field of prediction. Data is widely available in the healthcare industry, and in order to improve prognosis, diagnosis, therapy, medication development, and healthcare in general, information must be extracted from it. Based on the World Health Organisation's 2014 report, diabetes is a type of chronic disease with the fastest global growth rates. To illustrate the widely used techniques for early diabetes detection—which are based on cutting-edge technologies including machine learning, cloud computing, etc.—we have reviewed a few significant pieces of literature in this study. The findings suggested that artificial intelligence-based methods are more effective in the early detection of diabetes in patients. Here, we used the Random Forest model to conduct an experiment using a diabetes dataset. First, the dataset is resampled and then used to train and test the Random Forest model. On all performance criteria, the Random Forest attained values above 96%

International Journal on Recent and Innovation Trends in Computing and Communication

Diabetes Mellitus Disease Prediction using Machine Learning Algorithms

Author: Karnila Sri
Kurniawan Hendra
Purwati Neni
Rofianto Dani
Safitri Egi
Publication venue: Jurusan Informatika Universitas Tanjungpura
Publication date: 02/11/2024
Field of study

Diabetes mellitus is a chronic disease with a rapidly increasing global prevalence, affecting around 422 million people, predominantly in low- and middle-income countries. Effective management of diabetes requires early detection and timely intervention. This study aims to develop an accurate predictive model for diabetes mellitus using three machine learning algorithms: Random Forest, Logistic Regression, and Decision Tree. The Pima Indians Diabetes dataset, comprising 768 patient records with various health indicators, was utilized for model training and evaluation. Exploratory data analysis revealed significant correlations between glucose levels, BMI, age, and diabetes risk. The dataset was split into 80% training and 20% testing sets. Models were validated using cross-validation and evaluated based on accuracy, precision, recall, and F1-score. Results indicated that Logistic Regression achieved the highest accuracy (75%) and balanced performance in identifying both positive and negative cases. Decision Tree excelled in recall, while Random Forest showed a slightly lower balance between precision and recall. The ROC curve analysis demonstrated that Random Forest had the highest AUC (0.82), followed by Logistic Regression (0.81) and Decision Tree (0.73). This study confirms that machine learning algorithms can effectively predict diabetes, providing valuable tools for early detection and intervention, ultimately reducing the global burden of diabetes mellitus

Jurnal Sistem dan Teknologi Informasi (JUSTIN)

Sensor-AssistedWeighted Average Ensemble Model for Detecting Major Depressive Disorder

Author: Chang Chuan-Yu
Gao Liang
Garg Akhil
Gutiérrez Reina Daniel
Mahendran Nivedhitha
Srinivasan Kathiravan
Vincent Durai Raj
Publication venue: 'MDPI AG'
Publication date: 01/11/2019
Field of study

The present methods of diagnosing depression are entirely dependent on self-report ratings or clinical interviews. Those traditional methods are subjective, where the individual may or may not be answering genuinely to questions. In this paper, the data has been collected using self-report ratings and also using electronic smartwatches. This study aims to develop a weighted average ensemble machine learning model to predict major depressive disorder (MDD) with superior accuracy. The data has been pre-processed and the essential features have been selected using a correlation-based feature selection method. With the selected features, machine learning approaches such as Logistic Regression, Random Forest, and the proposedWeighted Average Ensemble Model are applied. Further, for assessing the performance of the proposed model, the Area under the Receiver Optimization Characteristic Curves has been used. The results demonstrate that the proposed Weighted Average Ensemble model performs with better accuracy than the Logistic Regression and the Random Forest approaches

Multidisciplinary Digital Publishing Institute

idUS. Depósito de Investigación Universidad de Sevilla

Microaneurysms detection in color fundus images using machine learning based on directional local contrast

Author: Chen Jiali
Chen Zhiqing
Hu Ante
Liu Haipeng
Long Shengchun
Zheng Dingchang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/04/2020
Field of study

BACKGROUND: As one of the major complications of diabetes, diabetic retinopathy (DR) is a leading cause of visual impairment and blindness due to delayed diagnosis and intervention. Microaneurysms appear as the earliest symptom of DR. Accurate and reliable detection of microaneurysms in color fundus images has great importance for DR screening.METHODS: A microaneurysms' detection method using machine learning based on directional local contrast (DLC) is proposed for the early diagnosis of DR. First, blood vessels were enhanced and segmented using improved enhancement function based on analyzing eigenvalues of Hessian matrix. Next, with blood vessels excluded, microaneurysm candidate regions were obtained using shape characteristics and connected components analysis. After image segmented to patches, the features of each microaneurysm candidate patch were extracted, and each candidate patch was classified into microaneurysm or non-microaneurysm. The main contributions of our study are (1) making use of directional local contrast in microaneurysms' detection for the first time, which does make sense for better microaneurysms' classification. (2) Applying three different machine learning techniques for classification and comparing their performance for microaneurysms' detection. The proposed algorithm was trained and tested on e-ophtha MA database, and further tested on another independent DIARETDB1 database. Results of microaneurysms' detection on the two databases were evaluated on lesion level and compared with existing algorithms.RESULTS: The proposed method has achieved better performance compared with existing algorithms on accuracy and computation time. On e-ophtha MA and DIARETDB1 databases, the area under curve (AUC) of receiver operating characteristic (ROC) curve was 0.87 and 0.86, respectively. The free-response ROC (FROC) score on the two databases was 0.374 and 0.210, respectively. The computation time per image with resolution of 2544×1969, 1400×960 and 1500×1152 is 29 s, 3 s and 2.6 s, respectively.CONCLUSIONS: The proposed method using machine learning based on directional local contrast of image patches can effectively detect microaneurysms in color fundus images and provide an effective scientific basis for early clinical DR diagnosis.</p

Coventry University Pure Portal

Predicting diabetes-related hospitalizations based on electronic health records

Author: Brisimi Theodora S.
Dai Wuyang
Paschalidis Ioannis Ch.
Wang Taiyao
Xu Tingting
Publication venue: 'SAGE Publications'
Publication date: 01/12/2019
Field of study

OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip

Boston University Institutional Repository (OpenBU)

Robust Decision Trees Against Adversarial Examples

Author: Boning Duane
Chen Hongge
Hsieh Cho-Jui
Zhang Huan
Publication venue
Publication date: 11/06/2019
Field of study

Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worst-case perturbation of input features, which leads to a max-min saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is non-trivial due to the discrete nature of trees --- a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples

arXiv.org e-Print Archive

DSpace@MIT