712 research outputs found

    Model-based joint visualization of multiple compositional omics datasets

    Get PDF
    The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi

    A flexible and versatile framework for statistical design and analysis of quantitative mass spectrometry-based proteomic experiments

    Get PDF
    Quantitative mass spectrometry (MS)-based proteomics is an indispensable technology for biological and clinical research. As the proteomics field grows, MS-based proteomic workflows are becoming more complex and diverse. The accuracy and the throughput of the MS measurements and of the signal processing tools dramatically increased. However, many existing statistical tools and workflows have not followed the technological development. Therefore, there is a need for flexible statistical tools, which reflect diverse and complex workflows, are computationally efficient for large datasets, and maximize the reproducibility of the results. We propose a family of linear mixed effects models, and a split-plot view of the experimental design, that represent measurements from quantitative mass spectrometry-based proteomics. The whole plot part of the design reflects the structure of the biological variation of the experiment, such as case-control design, paired design, or time-course design. The subplot part of the design reflects the structure of the technological variation, such as fragmentation patterns, labeling strategy, and presence of multiple peptides per protein. We propose an estimation procedure that separately estimates the parameters of the subplot and the whole plot parts of the design, to maximize the flexibility of the model, increase the speed of the analysis, and facilitate the interpretation. The proposed modeling framework was validated using 9 controlled mixtures and 10 experimental datasets from targeted Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS), where signals were extracted with multiple signal processing tools. We implemented the proposed method in the software package MSstats, which checks the correctness of the user input, recognizes arbitrary complex experimental design, visualizes the data and performs statistical modeling and inference. It is interoperable with other existing computational tools such as Skyline

    Statistical methods for differential proteomics at peptide and protein level

    Get PDF

    Development of data processing methods for high resolution mass spectrometry-based metabolomics with an application to human liver transplantation

    Get PDF
    Direct Infusion (DI) Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry (MS) is becoming a popular measurement platform in metabolomics. This thesis aims to advance the data processing and analysis pipeline of the DI FT-ICR based metabolomics, and broaden its applicability to a clinical research. To meet the first objective, the issue of missing data that occur in a final data matrix containing metabolite relative abundances measured for each sample analysed, is addressed. The nature of these data and their effect on the subsequent data analyses are investigated. Eight common and/or easily accessible missing data estimation algorithms are examined and a three stage approach is proposed to aid the identification of the optimal one. Finally, a novel survival analysis approach is introduced and assessed as an alternative way of missing data treatment prior univariate analysis. To address the second objective, DI FT-ICR MS based metabolomics is assessed in terms of its applicability to research investigating metabolomic changes occurring in liver grafts throughout the human orthotopic liver transplantation (OLT). The feasibility of this approach to a clinical setting is validated and its potential to provide a wealth of novel metabolic information associated with OLT is demonstrated
    • …
    corecore