24 research outputs found

    Evaluation of O2PLS in Omics data integration

    Get PDF
    Background: Rapid computational and technological developments made large amounts of omics data available in different biological levels. It is becoming clear that simultaneous data analysis methods are needed for better interpretation and understanding of the underlying systems biology. Different methods have been proposed for this task, among them Partial Least Squares (PLS) related methods. To also deal with orthogonal variation, systematic variation in the data unrelated to one another, we consider the Two-way Orthogonal PLS (O2PLS): an integrative data analysis method which is capable of modeling systematic variation, while providing more parsimonious models aiding interpretation. Results: A simulation study to assess the performance of O2PLS showed positive results in both low and higher dimensions. More noise (50 % of the data) only affected the systematic part estimates. A data analysis was conducted using data on metabolomics and transcriptomics from a large Finnish cohort (DILGOM). A previous sequential study, using the same data, showed significant correlations between the Lipo-Leukocyte (LL) module and lipoprotein metabolites. The O2PLS results were in agreement with these findings, identifying almost the same set of co-varying variables. Moreover, our integrative approach identified other associative genes and metabolites, while taking into account systematic variation in the data. Including orthogonal components enhanced overall fit, but the orthogonal variation was difficult to interpret. Conclusions: Simulations showed that the O2PLS estimates were close to the true parameters in both low and higher dimensions. In the presence of more noise (50 %), the orthogonal part estimates could not distinguish well between joint and unique variation. The joint estimates were not systematically affected. Simultaneous analysis with O2PLS on metabolome and transcriptome data showed that the LL module, together with VLDL and HDL metabolites, were important for the metabolomic and transcriptomic relation. This is in agreement with an earlier study. In addition more gene expression and metabolites are identified being important for the joint covariation

    Discussion on the paper ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’ by Jeffrey S. Morris and Veerabhadran Baladandayuthapani

    Get PDF
    Bioinformatics is an important research area for statisticians. This discussion provides some additional topics to the paper, namely on statistical contributions to detect differential expressed genes, for protein structure prediction, and for the analysis of highly correlated features in Glycomics datasets

    Integrating omics datasets with the OmicsPLS package

    Get PDF
    Background: With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS. Results: We introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data. Conclusions: We propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLSand can be installed in R via install.packages("OmicsPLS")

    The application of omics techniques to understand the role of the gut microbiota in inflammatory bowel disease

    Get PDF
    The aetiopathogenesis of inflammatory bowel diseases (IBD) involves the complex interaction between a patient’s genetic predisposition, environment, gut microbiota and immune system. Currently, however, it is not known if the distinctive perturbations of the gut microbiota that appear to accompany both Crohn’s disease and ulcerative colitis are the cause of, or the result of, the intestinal inflammation that characterizes IBD. With the utilization of novel systems biology technologies, we can now begin to understand not only details about compositional changes in the gut microbiota in IBD, but increasingly also the alterations in microbiota function that accompany these. Technologies such as metagenomics, metataxomics, metatranscriptomics, metaproteomics and metabonomics are therefore allowing us a deeper understanding of the role of the microbiota in IBD. Furthermore, the integration of these systems biology technologies through advancing computational and statistical techniques are beginning to understand the microbiome interactions that both contribute to health and diseased states in IBD. This review aims to explore how such systems biology technologies are advancing our understanding of the gut microbiota, and their potential role in delineating the aetiology, development and clinical care of IBD

    Statistical Integration of Heterogeneous Data with PO2PLS

    Get PDF
    The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. Supplementary materials for this article are available online.Comment: 36 pages, 4 figures, Submitted to Journal of the American Statistical Associatio

    Integrating omics datasets with the OmicsPLS package

    Get PDF
    Background With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS. Results We introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data. Conclusions We propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLS and can be installed in R via install.packages(“OmicsPLS”)

    A concise review on multi-omics data integration for terroir analysis in Vitis vinifera

    Get PDF
    Mini reviewVitis vinifera (grapevine) is one of the most important fruit crops, both for fresh consumption and wine and spirit production. The term terroir is frequently used in viticulture and the wine industry to relate wine sensory attributes to its geographic origin. Although, it can be cultivated in a wide range of environments, differences in growing conditions have a significant impact on fruit traits that ultimately affect wine quality. Understanding how fruit quality and yield are controlled at a molecular level in grapevine in response to environmental cues has been a major driver of research. Advances in the area of genomics, epigenomics, transcriptomics, proteomics and metabolomics, have significantly increased our knowledge on the abiotic regulation of yield and quality in many crop species, including V. vinifera. The integrated analysis of multiple ‘omics’ can give us the opportunity to better understand how plants modulate their response to different environments. However, ‘omics’ technologies provide a large amount of biological data and its interpretation is not always straightforward, especially when different ‘omic’ results are combined. Here we examine the current strategies used to integrate multi-omics, and how these have been used in V. vinifera. In addition, we also discuss the importance of including epigenomics data when integrating omics data as epigenetic mechanisms could play a major role as an intermediary between the environment and the genome.Pastor Jullian Fabres, Cassandra Collins, Timothy R. Cavagnaro and Carlos M. Rodríguez Lópe

    Amorphophallus muelleri activates ferulic acid and phenylpropane biosynthesis pathways to defend against Fusarium solani infection

    Get PDF
    Amorphophallus sp. is an economically important crop for rural revitalization in southwest China. However, Fusarium solani often infects Amorphophallus sp. corms during storage, damaging the corm quality and affecting leaf elongation and flowering in the subsequent crop. In this study, the mechanism of resistance to F. solani was investigated in the leaf bud and flower bud corms of Amorphophallus muelleri through transcriptome and metabolome analyses. A total of 42.52 Gb clean reads and 1,525 metabolites were detected in a total of 12 samples including 3 samples each of disease-free leaf bud corms (LC), leaf bud corms inoculated with F. solani for three days (LD), disease-free flower bud corms (FC), and flower bud corms inoculated with F. solani for three days (FD). Transcriptome, metabolome, and conjoint analyses showed that ‘MAPK signal transduction’, ‘plant-pathogen interaction’, ‘plant hormone signal transduction’, and other secondary metabolite biosynthesis pathways, including ‘phenylpropane biosynthesis’, ‘arachidonic acid metabolism’, ‘stilbene, diarylheptane and gingerolin biosynthesis’, and ‘isoquinoline alkaloids biosynthesis’, among others, were involved in the defense response of A. muelleri to F. solani. Ultimately, the expression of six genes of interest (AmCDPK20, AmRBOH, AmWRKY33, Am4CL, Am POD and AmCYP73A1) was validated by real-time fluorescence quantitative polymerase chain reaction, and the results indicated that these genes were involved in the response of A. muelleri to F. solani. Ferulic acid inhibited the growth of F. solani, reducing the harm caused by F. solani to A. muelleri corms to a certain extent. Overall, this study lays a strong foundation for further investigation of the interaction between A. muelleri and F. solani, and provides a list of genes for the future breeding of F. solani-resistant A. muelleri cultivars

    Multi-omics integration identifies key upstream regulators of pathomechanisms in hypertrophic cardiomyopathy due to truncating MYBPC3 mutations

    Get PDF
    BACKGROUND: Hypertrophic cardiomyopathy (HCM) is the most common genetic disease of the cardiac muscle, frequently caused by mutations in MYBPC3. However, little is known about the upstream pathways and key regulators causing the disease. Therefore, we employed a multi-omics approach to study the pathomechanisms underlying HCM comparing patient hearts harboring MYBPC3 mutations to control hearts. RESULTS: Using H3K27ac ChIP-seq and RNA-seq we obtained 9310 differentially acetylated regions and 2033 differentially expressed genes, respectively, between 13 HCM and 10 control hearts. We obtained 441 differentially expressed proteins between 11 HCM and 8 control hearts using proteomics. By integrating multi-omics datasets, we identified a set of DNA regions and genes that differentiate HCM from control hearts and 53 protein-coding genes as the major contributors. This comprehensive analysis consistently points toward altered extracellular matrix formation, muscle contraction, and metabolism. Therefore, we studied enriched transcription factor (TF) binding motifs and identified 9 motif-encoded TFs, including KLF15, ETV4, AR, CLOCK, ETS2, GATA5, MEIS1, RXRA, and ZFX. Selected candidates were examined in stem cell-derived cardiomyocytes with and without mutated MYBPC3. Furthermore, we observed an abundance of acetylation signals and transcripts derived from cardiomyocytes compared to non-myocyte populations. CONCLUSIONS: By integrating histone acetylome, transcriptome, and proteome profiles, we identified major effector genes and protein networks that drive the pathological changes in HCM with mutated MYBPC3. Our work identifies 38 highly affected protein-coding genes as potential plasma HCM biomarkers and 9 TFs as potential upstream regulators of these pathomechanisms that may serve as possible therapeutic targets

    Transcriptomic and metabolomic correlation analysis: effect of initial SO2 addition on higher alcohol synthesis in Saccharomyces cerevisiae and identification of key regulatory genes

    Get PDF
    IntroductionHigher alcohols are volatile compounds produced during alcoholic fermentation that affect the quality and safety of the final product. This study used a correlation analysis of transcriptomics and metabolomics to study the impact of the initial addition of SO2 (30, 60, and 90 mg/L) on the synthesis of higher alcohols in Saccharomyces cerevisiae EC1118a and to identify key genes and metabolic pathways involved in their metabolism.MethodsTranscriptomics and metabolomics correlation analyses were performed and differentially expressed genes (DEGs) and differential metabolites were identified. Single-gene knockouts for targeting genes of important pathways were generated to study the roles of key genes involved in the regulation of higher alcohol production.ResultsWe found that, as the SO2 concentration increased, the production of total higher alcohols showed an overall trend of first increasing and then decreasing. Multi-omics correlation analysis revealed that the addition of SO2 affected carbon metabolism (ko01200), pyruvate metabolism (ko00620), glycolysis/gluconeogenesis (ko00010), the pentose phosphate pathway (ko00030), and other metabolic pathways, thereby changing the precursor substances. The availability of SO2 indirectly affects the formation of higher alcohols. In addition, excessive SO2 affected the growth of the strain, leading to the emergence of a lag phase. We screened the ten most likely genes and constructed recombinant strains to evaluate the impact of each gene on the formation of higher alcohols. The results showed that ADH4, SER33, and GDH2 are important genes of alcohol metabolism in S. cerevisiae. The isoamyl alcohol content of the EC1118a-ADH4 strain decreased by 21.003%; The isobutanol content of the EC1118a-SER33 strain was reduced by 71.346%; and the 2-phenylethanol content of EC1118a-GDH2 strain was reduced by 25.198%.ConclusionThis study lays a theoretical foundation for investigating the mechanism of initial addition of SO2 in the synthesis of higher alcohols in S. cerevisiae, uncovering DEGs and key metabolic pathways related to the synthesis of higher alcohols, and provides guidance for regulating these mechanisms
    corecore