695 research outputs found

    Statistical aspects of omics data analysis using the random compound covariate

    Full text link

    Multivariable association discovery in population-scale meta-omics studies.

    Get PDF
    It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2\u27s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and a rising incidence rate in the developed world. Molecular profiling techniques have been used to better understand the variability between tumors and disease models such as cell lines. To maximize the translatability and clinical relevance of in vitro studies, the selection of optimal cancer models is imperative. We have developed a deep learning-based method to measure the similarity between CRC tumors and disease models such as cancer cell lines. Our method efficiently leverages multiomics data sets containing copy number alterations, gene expression, and point mutations and learns latent factors that describe data in lower dimensions. These latent factors represent the patterns that are clinically relevant and explain the variability of molecular profiles across tumors and cell lines. Using these, we propose refined CRC subtypes and provide best-matching cell lines to different subtypes. These findings are relevant to patient stratification and selection of cell lines for early-stage drug discovery pipelines, biomarker discovery, and target identification

    Tissue-specific identification of multi-omics features for pan-cancer drug response prediction

    Get PDF
    Publisher Copyright: © 2022 The Author(s)Current statistical models for drug response prediction and biomarker identification fall short in leveraging the shared and unique information from various cancer tissues and multi-omics profiles. We developed mix-lasso model that introduces an additional sample group penalty term to capture tissue-specific effects of features on pan-cancer response prediction. The mix-lasso model takes into account both the similarity between drug responses (i.e., multi-task learning), and the heterogeneity between multi-omics data (multi-modal learning). When applied to large-scale pharmacogenomics dataset from Cancer Therapeutics Response Portal, mix-lasso enabled accurate drug response predictions and identification of tissue-specific predictive features in the presence of various degrees of missing data, drug-drug correlations, and high-dimensional and correlated genomic and molecular features that often hinder the use of statistical approaches in drug response modeling. Compared to tree lasso model, mix-lasso identified a smaller number of tissue-specific features, hence making the model more interpretable and stable for drug discovery applications.Peer reviewe

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Transcriptomics in Toxicogenomics, Part III : Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.Peer reviewe

    Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics

    Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways

    Get PDF
    Background: Mendelian randomization uses genetic variants, assumed to be instrumental variables for a particular exposure, to estimate the causal effect of that exposure on an outcome. If the instrumental variable criteria are satisfied, the resulting estimator is consistent even in the presence of unmeasured confounding and reverse causation. Methods: We extend the Mendelian randomization paradigm to investigate more complex networks of relationships between variables, in particular where some of the effect of an exposure on the outcome may operate through an intermediate variable (a mediator). If instrumental variables for the exposure and mediator are available, direct and indirect effects of the exposure on the outcome can be estimated, for example using either a regression-based method or structural equation models. The direction of effect between the exposure and a possible mediator can also be assessed. Methods are illustrated in an applied example considering causal relationships between body mass index, C-reactive protein and uric acid. Results: These estimators are consistent in the presence of unmeasured confounding if, in addition to the instrumental variable assumptions, the effects of both the exposure on the mediator and the mediator on the outcome are homogeneous across individuals and linear without interactions. Nevertheless, a simulation study demonstrates that even considerable heterogeneity in these effects does not lead to bias in the estimates. Conclusions: These methods can be used to estimate direct and indirect causal effects in a mediation setting, and have potential for the investigation of more complex networks between multiple interrelated exposures and disease outcomes

    Clinical improvement of DM1 patients reflected by reversal of disease-induced gene expression in blood

    Get PDF
    Background: Myotonic dystrophy type 1 (DM1) is an incurable multisystem disease caused by a CTG-repeat expansion in the DM1 protein kinase (DMPK) gene. The OPTIMISTIC clinical trial demonstrated positive and heterogenous effects of cognitive behavioral therapy (CBT) on the capacity for activity and social participations in DM1 patients. Through a process of reverse engineering, this study aims to identify druggable molecular biomarkers associated with the clinical improvement in the OPTIMISTIC cohort. Methods: Based on full blood samples collected during OPTIMISTIC, we performed paired mRNA sequencing for 27 patients before and after the CBT intervention. Linear mixed effect models were used to identify biomarkers associated with the disease-causing CTG expansion and the mean clinical improvement across all clinical outcome measures. Results: We identified 608 genes for which their expression was significantly associated with the CTG-repeat expansion, as well as 1176 genes significantly associated with the average clinical response towards the intervention. Remarkably, all 97 genes associated with both returned to more normal levels in patients who benefited the most from CBT. This main finding has been replicated based on an external dataset of mRNA data of DM1 patients and controls, singling these genes out as candidate biomarkers for therapy response. Among these candidate genes were DNAJB12, HDAC5, and TRIM8, each belonging to a protein family that is being studied in the context of neurological disorders or muscular dystrophies. Across the different gene sets, gene pathway enrichment analysis revealed disease-relevant impaired signaling in, among others, insulin-, metabolism-, and immune-related pathways. Furthermore, evidence for shared dysregulations with another neuromuscular disease, Duchenne muscular dystrophy, was found, suggesting a partial overlap in blood-based gene dysregulation. Conclusions: DM1-relevant disease signatures can be identified on a molecular level in peripheral blood, opening new avenues for drug discovery and therapy efficacy assessments.</p
    • …
    corecore