15,922 research outputs found
A Survey on Using Machine Learning to Predict Diabetes Early on
Diabetes is a category of metabolic disease caused by a prolonged high blood sugar level. It is sometimes referred to as a chronic disease. If accurate early prediction is achievable, it can considerably lower the risk factor and severity of diabetes. Combining data mining methods with machine learning, a subsection of artificial intelligence, offers promise in the field of prediction. Data is widely available in the healthcare industry, and in order to improve prognosis, diagnosis, therapy, medication development, and healthcare in general, information must be extracted from it. Based on the World Health Organisation's 2014 report, diabetes is a type of chronic disease with the fastest global growth rates. To illustrate the widely used techniques for early diabetes detection—which are based on cutting-edge technologies including machine learning, cloud computing, etc.—we have reviewed a few significant pieces of literature in this study. The findings suggested that artificial intelligence-based methods are more effective in the early detection of diabetes in patients. Here, we used the Random Forest model to conduct an experiment using a diabetes dataset. First, the dataset is resampled and then used to train and test the Random Forest model. On all performance criteria, the Random Forest attained values above 96%
Diabetes Mellitus Disease Prediction using Machine Learning Algorithms
Diabetes mellitus is a chronic disease with a rapidly increasing global prevalence, affecting around 422 million people, predominantly in low- and middle-income countries. Effective management of diabetes requires early detection and timely intervention. This study aims to develop an accurate predictive model for diabetes mellitus using three machine learning algorithms: Random Forest, Logistic Regression, and Decision Tree. The Pima Indians Diabetes dataset, comprising 768 patient records with various health indicators, was utilized for model training and evaluation. Exploratory data analysis revealed significant correlations between glucose levels, BMI, age, and diabetes risk. The dataset was split into 80% training and 20% testing sets. Models were validated using cross-validation and evaluated based on accuracy, precision, recall, and F1-score. Results indicated that Logistic Regression achieved the highest accuracy (75%) and balanced performance in identifying both positive and negative cases. Decision Tree excelled in recall, while Random Forest showed a slightly lower balance between precision and recall. The ROC curve analysis demonstrated that Random Forest had the highest AUC (0.82), followed by Logistic Regression (0.81) and Decision Tree (0.73). This study confirms that machine learning algorithms can effectively predict diabetes, providing valuable tools for early detection and intervention, ultimately reducing the global burden of diabetes mellitus
Sensor-AssistedWeighted Average Ensemble Model for Detecting Major Depressive Disorder
The present methods of diagnosing depression are entirely dependent on self-report
ratings or clinical interviews. Those traditional methods are subjective, where the individual may
or may not be answering genuinely to questions. In this paper, the data has been collected using
self-report ratings and also using electronic smartwatches. This study aims to develop a weighted
average ensemble machine learning model to predict major depressive disorder (MDD) with superior
accuracy. The data has been pre-processed and the essential features have been selected using a
correlation-based feature selection method. With the selected features, machine learning approaches
such as Logistic Regression, Random Forest, and the proposedWeighted Average Ensemble Model are
applied. Further, for assessing the performance of the proposed model, the Area under the Receiver
Optimization Characteristic Curves has been used. The results demonstrate that the proposed
Weighted Average Ensemble model performs with better accuracy than the Logistic Regression and
the Random Forest approaches
Microaneurysms detection in color fundus images using machine learning based on directional local contrast
BACKGROUND: As one of the major complications of diabetes, diabetic retinopathy (DR) is a leading cause of visual impairment and blindness due to delayed diagnosis and intervention. Microaneurysms appear as the earliest symptom of DR. Accurate and reliable detection of microaneurysms in color fundus images has great importance for DR screening.METHODS: A microaneurysms' detection method using machine learning based on directional local contrast (DLC) is proposed for the early diagnosis of DR. First, blood vessels were enhanced and segmented using improved enhancement function based on analyzing eigenvalues of Hessian matrix. Next, with blood vessels excluded, microaneurysm candidate regions were obtained using shape characteristics and connected components analysis. After image segmented to patches, the features of each microaneurysm candidate patch were extracted, and each candidate patch was classified into microaneurysm or non-microaneurysm. The main contributions of our study are (1) making use of directional local contrast in microaneurysms' detection for the first time, which does make sense for better microaneurysms' classification. (2) Applying three different machine learning techniques for classification and comparing their performance for microaneurysms' detection. The proposed algorithm was trained and tested on e-ophtha MA database, and further tested on another independent DIARETDB1 database. Results of microaneurysms' detection on the two databases were evaluated on lesion level and compared with existing algorithms.RESULTS: The proposed method has achieved better performance compared with existing algorithms on accuracy and computation time. On e-ophtha MA and DIARETDB1 databases, the area under curve (AUC) of receiver operating characteristic (ROC) curve was 0.87 and 0.86, respectively. The free-response ROC (FROC) score on the two databases was 0.374 and 0.210, respectively. The computation time per image with resolution of 2544×1969, 1400×960 and 1500×1152 is 29 s, 3 s and 2.6 s, respectively.CONCLUSIONS: The proposed method using machine learning based on directional local contrast of image patches can effectively detect microaneurysms in color fundus images and provide an effective scientific basis for early clinical DR diagnosis.</p
Predicting diabetes-related hospitalizations based on electronic health records
OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip
Robust Decision Trees Against Adversarial Examples
Although adversarial examples and model robustness have been extensively
studied in the context of linear models and neural networks, research on this
issue in tree-based models and how to make tree-based models robust against
adversarial examples is still limited. In this paper, we show that tree based
models are also vulnerable to adversarial examples and develop a novel
algorithm to learn robust trees. At its core, our method aims to optimize the
performance under the worst-case perturbation of input features, which leads to
a max-min saddle point problem. Incorporating this saddle point objective into
the decision tree building procedure is non-trivial due to the discrete nature
of trees --- a naive approach to finding the best split according to this
saddle point objective will take exponential time. To make our approach
practical and scalable, we propose efficient tree building algorithms by
approximating the inner minimizer in this saddle point problem, and present
efficient implementations for classical information gain based trees as well as
state-of-the-art tree boosting models such as XGBoost. Experimental results on
real world datasets demonstrate that the proposed algorithms can substantially
improve the robustness of tree-based models against adversarial examples
- …
