1,601 research outputs found

    Machine Learning Approach for the Early Prediction of the Risk of Overweight and Obesity in Young People

    Get PDF
    Obesity is a major global concern with more than 2.1 billion people overweight or obese worldwide which amounts to almost 30% of the global population. If the current trend continues, the overweight and obese population is likely to increase to 41% by 2030. Individuals developing signs of weight gain or obesity are also at a risk of developing serious illnesses such as type 2 diabetes, respiratory problems, heart disease and stroke. Some intervention measures such as physical activity and healthy eating can be a fundamental component to maintain a healthy lifestyle. Therefore, it is absolutely essential to detect childhood obesity as early as possible. This paper utilises the vast amount of data available via UK’s millennium cohort study in order to construct a machine learning driven model to predict young people at the risk of becoming overweight or obese. The childhood BMI values from the ages 3, 5, 7 and 11 are used to predict adolescents of age 14 at the risk of becoming overweight or obese. There is an inherent imbalance in the dataset of individuals with normal BMI and the ones at risk. The results obtained are encouraging and a prediction accuracy of over 90% for the target class has been achieved. Various issues relating to data preprocessing and prediction accuracy are addressed and discussed

    Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018

    Get PDF
    Obesity is strongly associated with multiple risk factors. It is significantly contributing to an increased risk of chronic disease morbidity and mortality worldwide. There are various challenges to better understand the association between risk factors and the occurrence of obesity. The traditional regression approach limits analysis to a small number of predictors and imposes assumptions of independence and linearity. Machine Learning (ML) methods are an alternative that provide information with a unique approach to the application stage of data analysis on obesity. This study aims to assess the ability of ML methods, namely Logistic Regression, Classification and Regression Trees (CART), and Naïve Bayes to identify the presence of obesity using publicly available health data, using a novel approach with sophisticated ML methods to predict obesity as an attempt to go beyond traditional prediction models, and to compare the performance of three different methods. Meanwhile, the main objective of this study is to establish a set of risk factors for obesity in adults among the available study variables. Furthermore, we address data imbalance using Synthetic Minority Oversampling Technique (SMOTE) to predict obesity status based on risk factors available in the dataset. This study indicates that the Logistic Regression method shows the highest performance. Nevertheless, kappa coefficients show only moderate concordance between predicted and measured obesity. Location, marital status, age groups, education, sweet drinks, fatty/oily foods, grilled foods, preserved foods, seasoning powders, soft/carbonated drinks, alcoholic drinks, mental emotional disorders, diagnosed hypertension, physical activity, smoking, and fruit and vegetables consumptions are significant in predicting obesity status in adults. Identifying these risk factors could inform health authorities in designing or modifying existing policies for better controlling chronic diseases especially in relation to risk factors associated with obesity. Moreover, applying ML methods on publicly available health data, such as Indonesian Basic Health Research (RISKESDAS) is a promising strategy to fill the gap for a more robust understanding of the associations of multiple risk factors in predicting health outcomes

    Decision trees in epidemiological research

    Get PDF
    Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation

    Identifying risk patterns for suicide attempts in individuals with diabetes : a data-driven approach using LASSO regression

    Get PDF
    Diabetes is a major health concern in the United States, with 34.2 million Americans affected in 2020. Unfortunately, the risk of suicide is also elevated in individuals with diabetes, with around 90,000 people with diabetes committing suicide each year. People with type 1 diabetes are three to four times more likely to attempt suicide, and those with newly diagnosed type 2 diabetes are twice as likely to attempt suicide compared to the general population. However, poor mental health comorbidity is still neglected, and more recommendations are needed to support for people with diabetes. It is widely acknowledged that the comorbidity of depression with diabetes is considered a higher risk factor for suicide attempts Previous studies have used logistic regression to identify risk factors for suicide attempts in individuals with diabetes. However, this technique can be prone to overfitting when the number of variables is high. To address this issue, we used the LASSO (Least Absolute Shrinkage and Selection Operator), a regularization technique, to reduce overfitting in a logistic regression model. It works by adding a penalty term ([lambda]) to the log-likelihood function, which shrinks the estimates of the coefficients. This process allows LASSO to act as a feature selection method, effectively setting coefficients that contribute most to the error to zero. Because few studies have focused on un derstanding the relationship between suicide attempts and diabetes, we used association rule mining ARM an explainable rule based machine learning technique, for knowledge discovery to reveal previously unknown relationships between suicide attempts and diabetes. This approach has already proved useful in the medical field, where it has been applied to electronic health record (EHR) data to discover associations such as disease co-occurrences, drug-disease associations, and symptomatic patterns of disease. However, no previous studies have used ARM to determine risk factors and predict suicide attempts in people with diabetes. The aim of this dissertation is to identify patterns of risk factors for suicide attempts in individuals with diabetes, with the long term goal of developing a clinical decision support system that can be integrated into EHRs. This system would allow healthcare providers to identify patients with diabetes at high risk of suicide attempts and provide appropriate preventive measures during outpatient clinic visits. To achieve this goal, we have three specific aims: (1) to identify potential risk factors for suicide attempts in individuals with diabetes through a literature review; (2) to investigate risk factors for suicide attempts in individuals with diabetes using LASSO regression; (3) to identify risk patterns for suicide attempts in individuals with diabetes using association rule mining. In this dissertation, we have reviewed the literature and compiled a list of data elements for suicide attempts in people with diabetes. We then retrieved data on patients with diabetes from Cerner Real-World Data [trade mark]. LASSO regression was used for feature selection, and ARM was used for investigating the risk patterns. We discovered risk patterns that are understandable and practical for healthcare providers. The findings of this research can inform suicide prevention efforts for people with diabetes and contribute to improved mental health outcomes.Includes bibliographical references

    Application of Machine Learning Techniques to Predict Teenage Obesity Using Earlier Childhood Measurements from Millennium Cohort Study

    Get PDF
    Obesity is a major global concern with more than 2.1 billion people overweight or obese worldwide, which amounts to almost 30% of the global population. If the current trend continues, the overweight and obese population is likely to increase to 41% by 2030. Individuals developing signs of weight gain or obesity are also at the risk of developing serious illnesses such as type 2 diabetes, respiratory problems, heart disease, stroke, and even death. It is essential to detect childhood obesity as early as possible since children who are either overweight or obese in their younger age tend to stay obese in their adult lives. This research utilises the vast amount of data available via UK's millennium cohort study to construct machine learning driven framework to predict young people at the risk of becoming overweight or obese. The focus of this paper is to develop a framework to predict childhood obesity using earlier childhood data and other relevant features. The use of novel data balancing technique and inclusion of additional relevant features resulted in sensitivity, specificity, and F1-score of 77.32%, 76.81%, and 77.02% respectively. The proposed technique utilises easily obtainable features making it suitable to be used in a clinical and non-clinical environment
    • …
    corecore