441 research outputs found

    Proteomic signatures for identification of impaired glucose tolerance

    Get PDF
    The implementation of recommendations for type 2 diabetes (T2D) screening and diagnosis focuses on the measurement of glycated hemoglobin (HbA1c) and fasting glucose. This approach leaves a large number of individuals with isolated impaired glucose tolerance (iIGT), who are only detectable through oral glucose tolerance tests (OGTTs), at risk of diabetes and its severe complications. We applied machine learning to the proteomic profiles of a single fasted sample from 11,546 participants of the Fenland study to test discrimination of iIGT defined using the gold-standard OGTTs. We observed significantly improved discriminative performance by adding only three proteins (RTN4R, CBPM and GHR) to the best clinical model (AUROC = 0.80 (95% confidence interval: 0.79–0.86), P = 0.004), which we validated in an external cohort. Increased plasma levels of these candidate proteins were associated with an increased risk for future T2D in an independent cohort and were also increased in individuals genetically susceptible to impaired glucose homeostasis and T2D. Assessment of a limited number of proteins can identify individuals likely to be missed by current diagnostic strategies and at high risk of T2D and its complications

    Elucidating causal relationships between energy homeostasis and cardiometabolic outcomes

    Get PDF
    Energy metabolism dyshomeostasis is associated with multiple health problems. For example, abundant epidemiological data show that obesity and overweight increase the risk of cardiometabolic diseases and early mortality. Type 2 diabetes (T2D), characterized by chronically elevated blood glucose, is also associated with debilitating complications, high healthcare costs and mortality, with cardiovascular complications accounting for more than half of T2D-related deaths. Prediabetes, which is defined as elevated blood glucose below the diagnostic threshold for T2D, affects approximately 350M people worldwide, with about 35-50% developing T2D within 5 years. Further, non-alcoholic fatty liver disease, a form of ectopic fat deposition as a result of energy imbalance, is associated with increased risk of T2D, CVD and hepatocellular carcinoma. Determination of causal relationships between phenotypes related to positive energy balance and disease outcomes, as well as elucidation of the nature of these relationships, may help inform public health intervention policies. In addition, utilizing big data and machine learning (ML) approaches can improve prediction of outcomes related to excess adiposity both for research purposes and eventual validation and clinical translation. AimsIn paper 1, I set out to summarize observational evidence and further determine the causal relationships between prediabetes and common vascular complications associated with T2D i.e., coronary artery disease (CAD), stroke and renal disease. In paper 2, I studied the association between LRIG1 genetic variants and BMI, T2D and lipid biomarkers. In paper 3, we used ML to identify novel molecular features associated with non-alcoholic fatty liver disease (NAFLD). In paper 4, I elucidate the nature of causal relationships between BMI and cardiometabolic traits and investigate sex differences within the causal framework.ResultsPrediabetes was associated with CAD and stroke but not renal disease in observational analyses, whilst in the causal inference analyses, prediabetes was only associated with CAD. Common LRIG1 variant (rs4856886) was associated with increased BMI and lipid hyperplasia but a decreased risk of T2D. In paper 3, models using common clinical variables showed strong NAFLD prediction ability (ROCAUC = 0.73, p < 0.001); addition of hepatic and glycemic biomarkers and omics data to these models strengthened predictive power (ROCAUC = 0.84, p < 0.001). Finally, there was evidence of non-linearity in the causal effect of BMI on T2D and CAD, biomarkers and blood pressure. The causal effects BMI on CAD were different in men and women, though this difference did no hold after Bonferroni correction. ConclusionWe show that derangements in energy homeostasis are causally associated with increased risk of cardiometabolic outcomes and that early intervention on perturbed glucose control and excess adiposity may help prevent these adverse health outcomes. In addition, effects of novel LRIG1 genetic variants on BMI and T2D might enrich our understanding of lipid metabolism and T2D and thus warrant further investigations. Finally, application of ML to multidimensional data improves prediction of NAFLD; similar approaches could be used in other disease research

    Quantification of glycated hemoglobin and glucose in vivo using Raman spectroscopy and artificial neural networks

    Get PDF
    Undiagnosed type 2 diabetes (T2D) remains a major public health concern. The global estimation of undiagnosed diabetes is about 46%, being this situation more critical in developing countries. Therefore, we proposed a non-invasive method to quantify glycated hemoglobin (HbA1c) and glucose in vivo. We developed a technique based on Raman spectroscopy, RReliefF as a feature selection method, and regression based on feed-forward artificial neural networks (FFNN). The spectra were obtained from the forearm, wrist, and index finger of 46 individuals. The use of FFNN allowed us to achieve an error in the predictive model of 0.69% for HbA1c and 30.12 mg/dL for glucose. Patients were classified according to HbA1c values into three categories: healthy, prediabetes, and T2D. The proposed method obtained a specificity and sensitivity of 87.50% and 80.77%, respectively. This work demonstrates the benefit of using artificial neural networks and feature selection techniques to enhance Raman spectra processing to determine glycated hemoglobin and glucose in patients with undiagnosed T2D

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test

    Get PDF
    Diabetes is a large healthcare burden worldwide. There is substantial evidence that lifestyle modifications and drug intervention can prevent diabetes, therefore, an early identification of high risk individuals is important to design targeted prevention strategies. In this paper, we present an automatic tool that uses machine learning techniques to predict the development of type 2 diabetes mellitus (T2DM). Data generated from an oral glucose tolerance test (OGTT) was used to develop a predictive model based on the support vector machine (SVM). We trained and validated the models using the OGTT and demographic data of 1,492 healthy individuals collected during the San Antonio Heart Study. This study collected plasma glucose and insulin concentrations before glucose intake and at three time-points thereafter (30, 60 and 120 min). Furthermore, personal information such as age, ethnicity and body-mass index was also a part of the data-set. Using 11 OGTT measurements, we have deduced 61 features, which are then assigned a rank and the top ten features are shortlisted using minimum redundancy maximum relevance feature selection algorithm. All possible combinations of the 10 best ranked features were used to generate SVM based prediction models. This research shows that an individual’s plasma glucose levels, and the information derived therefrom have the strongest predictive performance for the future development of T2DM. Significantly, insulin and demographic features do not provide additional performance improvement for diabetes prediction. The results of this work identify the parsimonious clinical data needed to be collected for an efficient prediction of T2DM. Our approach shows an average accuracy of 96.80% and a sensitivity of 80.09% obtained on a holdout set

    Utilizing Temporal Information in The EHR for Developing a Novel Continuous Prediction Model

    Get PDF
    Type 2 diabetes mellitus (T2DM) is a nation-wide prevalent chronic condition, which includes direct and indirect healthcare costs. T2DM, however, is a preventable chronic condition based on previous clinical research. Many prediction models were based on the risk factors identified by clinical trials. One of the major tasks of the T2DM prediction models is to estimate the risks for further testing by HbA1c or fasting plasma glucose to determine whether the patient has or does not have T2DM because nation-wide screening is not cost-effective. Those models had substantial limitations on data quality, such as missing values. In this dissertation, I tested the conventional models which were based on the most widely used risk factors to predict the possibility of developing T2DM. The AUC was an average of 0.5, which implies the conventional model cannot be used to screen for T2DM risks. Based on this result, I further implemented three types of temporal representations, including non-temporal representation, interval-temporal representation, and continuous-temporal representation for building the T2DM prediction model. According to the results, continuous-temporal representation had the best performance. Continuous-temporal representation was based on deep learning methods. The result implied that the deep learning method could overcome the data quality issue and could achieve better performance. This dissertation also contributes to a continuous risk output model based on the seq2seq model. This model can generate a monotonic increasing function for a given patient to predict the future probability of developing T2DM. The model is workable but still has many limitations to overcome. Finally, this dissertation demonstrates some risks factors which are underestimated and are worthy for further research to revise the current T2DM screening guideline. The results were still preliminary. I need to collaborate with an epidemiologist and other fields to verify the findings. In the future, the methods for building a T2DM prediction model can also be used for other prediction models of chronic conditions

    Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India.

    Get PDF
    BACKGROUND: The prevalence of type 2 diabetes has increased substantially in India over the past 3 decades. Undiagnosed diabetes presents a public health challenge, especially in rural areas, where access to laboratory testing for diagnosis may not be readily available. OBJECTIVES: The present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). METHODS: The outcome variable prediabetes status (yes/no) used throughout this study was determined based upon a fasting blood glucose measurement ≥100 mg/dL. The algorithms utilized included the generalized linear model (GLM), random forest, least absolute shrinkage and selection operator (LASSO), elastic net (EN), and generalized linear mixed model (GLMM) with family unit as a (cluster) random (intercept) effect to account for intrafamily correlation. Model performance was assessed on held-out test data, and comparisons made with respect to area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. RESULTS: The GLMM, GLM, LASSO, and random forest modeling techniques each performed quite well (AUCs >0.70) and included the GDQS food groups and age, among other predictors. The fully adjusted GLMM, which included a random intercept for family unit, achieved slightly superior results (AUC of 0.72) in classifying the prediabetes outcome in these cluster-correlated data. CONCLUSIONS: The models presented in the current work show promise in identifying individuals at risk of developing diabetes, although further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance. In addition, future studies to examine the utility of the GDQS in screening for other noncommunicable diseases are recommended

    Machine learning for data integration in human gut microbiome

    Get PDF
    Recent studies have demonstrated that gut microbiota plays critical roles in various human diseases. High-throughput technology has been widely applied to characterize the microbial ecosystems, which led to an explosion of different types of molecular profiling data, such as metagenomics, metatranscriptomics and metabolomics. For analysis of such data, machine learning algorithms have shown to be useful for identifying key molecular signatures, discovering potential patient stratifications, and particularly for generating models that can accurately predict phenotypes. In this review, we first discuss how dysbiosis of the intestinal microbiota is linked to human disease development and how potential modulation strategies of the gut microbial ecosystem can be used for disease treatment. In addition, we introduce categories and workflows of different machine learning approaches, and how they can be used to perform integrative analysis of multi-omics data. Finally, we review advances of machine learning in gut microbiome applications and discuss related challenges. Based on this we conclude that machine learning is very well suited for analysis of gut microbiome and that these approaches can be useful for development of gut microbe-targeted therapies, which ultimately can help in achieving personalized and precision medicine

    Non-communicable Diseases, Big Data and Artificial Intelligence

    Get PDF
    This reprint includes 15 articles in the field of non-communicable Diseases, big data, and artificial intelligence, overviewing the most recent advances in the field of AI and their application potential in 3P medicine
    corecore