3,576 research outputs found

    Machine Learning Approaches for the Prediction of Obesity using Publicly Available Genetic Profiles

    Get PDF
    This paper presents a novel approach based on the analysis of genetic variants from publicly available genetic profiles and the manually curated database, the National Human Genome Research Institute Catalog. Using data science techniques, genetic variants are identified in the collected participant profiles then indexed as risk variants in the National Human Genome Research Institute Catalog. Indexed genetic variants or Single Nucleotide Polymorphisms are used as inputs in various machine learning algorithms for the prediction of obesity. Body mass index status of participants is divided into two classes, Normal Class and Risk Class. Dimensionality reduction tasks are performed to generate a set of principal variables - 13 SNPs - for the application of various machine learning methods. The models are evaluated using receiver operator characteristic curves and the area under the curve. Machine learning techniques including gradient boosting, generalized linear model, classification and regression trees, K-nearest neighbours, support vector machines, random forest and multilayer neural network are comparatively assessed in terms of their ability to identify the most important factors among the initial 6622 variables describing genetic variants, age and gender, to classify a subject into one of the body mass index related classes defined in this study. Our simulation results indicated that support vector machine generated high accuracy value of 90.5%

    IoT Framework for a Decision-Making System of Obesity and Overweight Extrapolation among Children, Youths, and Adults

    Get PDF
    Approximately 30% of the global population is suffering from obesity and being overweight, which is approximately 2.1 billion people worldwide. The ratio is expected to surpass 40% by 2030 if the current balance continues to grow. The global pandemic due to COVID-19 will also impact the predicted obesity rates. It will cause a significant increase in morbidity and mortality worldwide. Multiple chronic diseases are associated with obesity and several threat elements are associated with obesity. Various challenges are involved in the understanding of risk factors and the ratio of obesity. Therefore, diagnosing obesity in its initial stages might significantly increase the patient’s chances of effective treatment. The Internet of Things (IoT) has attained an evolving stage in the development of the contemporary environment of healthcare thanks to advancements in information and communication technologies. Therefore, in this paper, we thoroughly investigated machine learning techniques for making an IoT-enabled system. In the first phase, the proposed system analyzed the performances of random forest (RF), K-nearest neighbor (KNN), support vector machine (SVM), decision tree (DT), logistic regression (LR), and naïve Bayes (NB) algorithms on the obesity dataset. The second phase, on the other hand, introduced an IoT-based framework that adopts a multi-user request system by uploading the data to the cloud for the early diagnosis of obesity. The IoT framework makes the system available to anyone (and everywhere) for precise obesity categorization. This research will help the reader understand the relationships among risk factors with weight changes and their visualizations. Furthermore, it also focuses on how existing datasets can help one study the obesity nature and which classification and regression models perform well in correspondence to others

    Estimation of obesity levels based on computational intelligence

    Get PDF
    Obesity is a worldwide disease that affects people of all ages and gender; in consequence, researchers have made great efforts to identify factors that cause it early. In this study, an intelligent method is created, based on supervised and unsupervised techniques of data mining such as Simple K-Means, Decision Trees (DT), and Support Vector Machines (SVM) to detect obesity levels and help people and health professionals to have a healthier lifestyle against this global epidemic. In this research the primary source of collection was from students 18 and 25 years old at institutions in the countries of Colombia, Mexico, and Peru. The study takes a dataset relating to the main causes of obesity, based on the aim to reference high caloric intake, a decrease of energy expenditure due to the lack of physical activity, alimentary disorders, genetics, socioeconomic factors, and/or anxiety and depression. In the selected dataset, 178 students participated in the study, 81 male and 97 female. Using algorithms including Decision Tree, Support Vector Machine (SVM), and Simple K-Means, the results show a relevant tool to perform a comparative analysis among the mentioned algorithms

    Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene-Gene and Gene-Diet Interactions.

    Get PDF
    Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual's risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants' obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity. Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121].This research was funded by the United States Department of Agriculture (USDA), Agriculture Research Service (ARS) under agreement no. 8050-51000-107-000D. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. The USDA is an equal opportunity provider and employer. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the USDA.S

    Computational Methods for the Analysis of Genomic Data and Biological Processes

    Get PDF
    In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality

    Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders

    Get PDF
    BACKGROUND: The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. METHODOLOGY/PRINCIPAL FINDINGS: The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8-12 kgs weight loss) could always be differentiated from non-responders (<4 kgs weight loss). We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box) approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%+/-8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier) improved prediction accuracy to 80.9%+/-2.2%. CONCLUSION: Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa

    Get PDF
    Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies
    • …
    corecore