47 research outputs found

    A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

    Get PDF
    Introduction: Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. Objectives: We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. Methods: We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. Results: There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. Conclusion: The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm

    Migrating from partial least squares discriminant analysis to artificial neural networks: A comparison of functionally equivalent visualisation and feature contribution tools using Jupyter Notebooks

    Get PDF
    Introduction: Metabolomics data is commonly modelled multivariately using partial least squares discriminant analysis (PLS-DA). Its success is primarily due to ease of interpretation, through projection to latent structures, and transparent assessment of feature importance using regression coefficients and Variable Importance in Projection scores. In recent years several non-linear machine learning (ML) methods have grown in popularity but with limited uptake essentially due to convoluted optimisation and interpretation. Artificial neural networks (ANNs) are a non-linear projection-based ML method that share a structural equivalence with PLS, and as such should be amenable to equivalent optimisation and interpretation methods. Objectives: We hypothesise that standardised optimisation, visualisation, evaluation and statistical inference techniques commonly used by metabolomics researchers for PLS-DA can be migrated to a non-linear, single hidden layer, ANN. Methods: We compared a standardised optimisation, visualisation, evaluation and statistical inference techniques workflow for PLS with the proposed ANN workflow. Both workflows were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks on GitHub. Results: The migration of the PLS workflow to a non-linear, single hidden layer, ANN was successful. There was a similarity in significant metabolites determined using PLS model coefficients and ANN Connection Weight Approach. Conclusion: We have shown that it is possible to migrate the standardised PLS-DA workflow to simple non-linear ANNs. This result opens the door for more widespread use and to the investigation of transparent interpretation of more complex ANN architectures

    Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

    Get PDF
    Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform

    Protection of specific maternal messenger RNAs by the P body protein CGH-1 (Dhh1/RCK) during Caenorhabditis elegans oogenesis

    Get PDF
    During oogenesis, numerous messenger RNAs (mRNAs) are maintained in a translationally silenced state. In eukaryotic cells, various translation inhibition and mRNA degradation mechanisms congregate in cytoplasmic processing bodies (P bodies). The P body protein Dhh1 inhibits translation and promotes decapping-mediated mRNA decay together with Pat1 in yeast, and has been implicated in mRNA storage in metazoan oocytes. Here, we have investigated in Caenorhabditis elegans whether Dhh1 and Pat1 generally function together, and how they influence mRNA sequestration during oogenesis. We show that in somatic tissues, the Dhh1 orthologue (CGH-1) forms Pat1 (patr-1)-dependent P bodies that are involved in mRNA decapping. In contrast, during oogenesis, CGH-1 forms patr-1–independent mRNA storage bodies. CGH-1 then associates with translational regulators and a specific set of maternal mRNAs, and prevents those mRNAs from being degraded. Our results identify somatic and germ cell CGH-1 functions that are distinguished by the involvement of PATR-1, and reveal that during oogenesis, numerous translationally regulated mRNAs are specifically protected by a CGH-1–dependent mechanism

    Metabolomics reveals mouse plasma metabolite responses to acute exercise and effects of disrupting AMPK-glycogen interactions

    Get PDF
    Introduction: The AMP-activated protein kinase (AMPK) is a master regulator of energy homeostasis that becomes activated by exercise and binds glycogen, an important energy store required to meet exercise-induced energy demands. Disruption of AMPK-glycogen interactions in mice reduces exercise capacity and impairs whole-body metabolism. However, the mechanisms underlying these phenotypic effects at rest and following exercise are unknown. Furthermore, the plasma metabolite responses to an acute exercise challenge in mice remain largely uncharacterized. Methods: Plasma samples were collected from wild type (WT) and AMPK double knock-in (DKI) mice with disrupted AMPK-glycogen binding at rest and following 30-min submaximal treadmill running. An untargeted metabolomics approach was utilized to determine the breadth of plasma metabolite changes occurring in response to acute exercise and the effects of disrupting AMPK-glycogen binding. Results: Relative to WT mice, DKI mice had reduced maximal running speed (p < 0.0001) concomitant with increased body mass (p < 0.01) and adiposity (p < 0.001). A total of 83 plasma metabolites were identified/annotated, with 17 metabolites significantly different (p < 0.05; FDR<0.1) in exercised (↑6; ↓11) versus rested mice, including amino acids, acylcarnitines and steroid hormones. Pantothenic acid was reduced in DKI mice versus WT. Distinct plasma metabolite profiles were observed between the rest and exercise conditions and between WT and DKI mice at rest, while metabolite profiles of both genotypes converged following exercise. These differences in metabolite profiles were primarily explained by exercise-associated increases in acylcarnitines and steroid hormones as well as decreases in amino acids and derivatives following exercise. DKI plasma showed greater decreases in amino acids following exercise versus WT. Conclusion: This is the first study to map mouse plasma metabolomic changes following a bout of acute exercise in WT mice and the effects of disrupting AMPK-glycogen interactions in DKI mice. Untargeted metabolomics revealed alterations in metabolite profiles between rested and exercised mice in both genotypes, and between genotypes at rest. This study has uncovered known and previously unreported plasma metabolite responses to acute exercise in WT mice, as well as greater decreases in amino acids following exercise in DKI plasma. Reduced pantothenic acid levels may contribute to differences in fuel utilization in DKI mice

    Changes to the gut microbiome in young children showing early behavioral signs of autism

    Get PDF
    The human gut microbiome has increasingly been associated with autism spectrum disorder (ASD), which is a neurological developmental disorder, characterized by impairments to social interaction. The ability of the gut microbiota to signal across the gut-brain-microbiota axis with metabolites, including short-chain fatty acids, impacts brain health and has been identified to play a role in the gastrointestinal and developmental symptoms affecting autistic children. The fecal microbiome of older children with ASD has repeatedly shown particular shifts in the bacterial and fungal microbial community, which are significantly different from age-matched neurotypical controls, but it is still unclear whether these characteristic shifts are detectable before diagnosis. Early microbial colonization patterns can have long-lasting effects on human health, and pre-emptive intervention may be an important mediator to more severe autism. In this study, we characterized both the microbiome and short-chain fatty acid concentrations of fecal samples from young children between 21 and 40 months who were showing early behavioral signs of ASD. The fungal richness and acetic acid concentrations were observed to be higher with increasing autism severity, and the abundance of several bacterial taxa also changed due to the severity of ASD. Bacterial diversity and SCFA concentrations were also associated with stool form, and some bacterial families were found with differential abundance according to stool firmness. An exploratory analysis of the microbiome associated with pre-emptive treatment also showed significant differences at multiple taxonomic levels. These differences may impact the microbial signaling across the gut-brain-microbiota axis and the neurological development of the children

    Metabolomics reveals mouse plasma metabolite responses to acute exercise and effects of disrupting AMPK-glycogen interactions

    Get PDF
    Introduction: The AMP-activated protein kinase (AMPK) is a master regulator of energy homeostasis that becomes activated by exercise and binds glycogen, an important energy store required to meet exercise-induced energy demands. Disruption of AMPK-glycogen interactions in mice reduces exercise capacity and impairs whole-body metabolism. However, the mechanisms underlying these phenotypic effects at rest and following exercise are unknown. Furthermore, the plasma metabolite responses to an acute exercise challenge in mice remain largely uncharacterized. Methods : Plasma samples were collected from wild type (WT) and AMPK double knock-in (DKI) mice with disrupted AMPK-glycogen binding at rest and following 30-min submaximal treadmill running. An untargeted metabolomics approach was utilized to determine the breadth of plasma metabolite changes occurring in response to acute exercise and the effects of disrupting AMPK-glycogen binding. Results: Relative to WT mice, DKI mice had reduced maximal running speed (p \u3c 0.0001) concomitant with increased body mass (p \u3c 0.01) and adiposity (p \u3c 0.001). A total of 83 plasma metabolites were identified/annotated, with 17 metabolites significantly different (p \u3c 0.05; FDR \u3c 0.1) in exercised (↑ 6; ↓ 11) versus rested mice, including amino acids, acylcarnitines and steroid hormones. Pantothenic acid was reduced in DKI mice versus WT. Distinct plasma metabolite profiles were observed between the rest and exercise conditions and between WT and DKI mice at rest, while metabolite profiles of both genotypes converged following exercise. These differences in metabolite profiles were primarily explained by exercise-associated increases in acylcarnitines and steroid hormones as well as decreases in amino acids and derivatives following exercise. DKI plasma showed greater decreases in amino acids following exercise versus WT. Conclusion : This is the first study to map mouse plasma metabolomic changes following a bout of acute exercise in WT mice and the effects of disrupting AMPK-glycogen interactions in DKI mice. Untargeted metabolomics revealed alterations in metabolite profiles between rested and exercised mice in both genotypes, and between genotypes at rest. This study has uncovered known and previously unreported plasma metabolite responses to acute exercise in WT mice, as well as greater decreases in amino acids following exercise in DKI plasma. Reduced pantothenic acid levels may contribute to differences in fuel utilization in DKI mice

    Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies

    Get PDF
    Background Quality assurance (QA) and quality control (QC) are two quality management processes that are integral to the success of metabolomics including their application for the acquisition of high quality data in any high-throughput analytical chemistry laboratory. QA defines all the planned and systematic activities implemented before samples are collected, to provide confidence that a subsequent analytical process will fulfil predetermined requirements for quality. QC can be defined as the operational techniques and activities used to measure and report these quality requirements after data acquisition. Aim of review This tutorial review will guide the reader through the use of system suitability and QC samples, why these samples should be applied and how the quality of data can be reported. Key scientific concepts of review System suitability samples are applied to assess the operation and lack of contamination of the analytical platform prior to sample analysis. Isotopically-labelled internal standards are applied to assess system stability for each sample analysed. Pooled QC samples are applied to condition the analytical platform, perform intra-study reproducibility measurements (QC) and to correct mathematically for systematic errors. Standard reference materials and long-term reference QC samples are applied for inter-study and inter-laboratory assessment of data

    Lipopolysaccharide-induced interferon response networks at birth are predictive of severe viral lower respiratory infections in the first year of life

    Get PDF
    Appropriate innate immune function is essential to limit pathogenesis and severity of severe lower respiratory infections (sLRI) during infancy, a leading cause of hospitalization and risk factor for subsequent asthma in this age group. Employing a systems biology approach to analysis of multi-omic profiles generated from a high-risk cohort (n = 50), we found that the intensity of activation of an LPS-induced interferon gene network at birth was predictive of sLRI risk in infancy (AUC = 0.724). Connectivity patterns within this network were stronger among susceptible individuals, and a systems biology approach identified IRF1 as a putative master regulator of this response. These findings were specific to the LPS-induced interferon response and were not observed following activation of viral nucleic acid sensing pathways. Comparison of responses at birth versus age 5 demonstrated that LPS-induced interferon responses but not responses triggered by viral nucleic acid sensing pathways may be subject to strong developmental regulation. These data suggest that the risk of sLRI in early life is in part already determined at birth, and additionally that the developmental status of LPS-induced interferon responses may be a key determinant of susceptibility. Our findings provide a rationale for the identification of at-risk infants for early intervention aimed at sLRI prevention and identifies targets which may be relevant for drug development
    corecore