1,908 research outputs found

    Semi-supervised LC/MS alignment for differential proteomics

    Get PDF
    Motivation: Mass spectrometry (MS) combined with high-performance liquid chromatography (LC) has received considerable attention for high-throughput analysis of proteomes. Isotopic labeling techniques such as ICAT [5,6] have been successfully applied to derive differential quantitative information for two protein samples, however at the price of significantly increased complexity of the experimental setup. To overcome these limitations, we consider a label-free setting where correspondences between elements of two samples have to be established prior to the comparative analysis. The alignment between samples is achieved by nonlinear robust ridge regression. The correspondence estimates are guided in a semi-supervised fashion by prior information which is derived from sequenced tandem mass spectra. Results: The semi-supervised method for finding correspondences was successfully applied to aligning highly complex protein samples, even if they exhibit large variations due to different biological conditions. A large-scale experiment clearly demonstrates that the proposed method bridges the gap between statistical data analysis and label-free quantitative differential proteomics. Availability: The software will be available on the website Contact: [email protected]

    DART-ID increases single-cell proteome coverage.

    Get PDF
    Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resourcesā€”in the form of tools, software, and databasesā€”is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Omics assisted N-terminal proteoform and protein expression profiling on methionine aminopeptidase 1 (MetAP1) deletion

    Get PDF
    Excision of the N-terminal initiator methionine (iMet) residue from nascent peptide chains is an essential and omnipresent protein modification carried out by methionine aminopeptidases (MetAPs) that accounts for a major source of N-terminal proteoform diversity. Although MetAP2 is known to be implicated in processes such as angiogenesis and proliferation in mammals, the physiological role of MetAP1 is much less clear. In this report we studied the omics-wide effects of human MetAP1 deletion and general MetAP inhibition. The levels of iMet retention are inversely correlated with cellular proliferation rates. Further, despite the increased MetAP2 expression on MetAP1 deletion, MetAP2 was unable to restore processing of Met-Ser-, Met-Pro-, and Met-Ala- starting N termini as inferred from the iMet retention profiles observed, indicating a higher activity of MetAP1 over these N termini. Proteome and transcriptome expression profiling point to differential expression of proteins implicated in lipid metabolism, cytoskeleton organization, cell proliferation and protein synthesis upon perturbation of MetAP activity

    Integrative analysis of extracellular and intracellular bladder cancer cell line proteome with transcriptome: improving coverage and validity of -omics findings

    Get PDF
    Characterization of disease-associated proteins improves our understanding of disease pathophysiology. Obtaining a comprehensive coverage of the proteome is challenging, mainly due to limited statistical power and an inability to verify hundreds of putative biomarkers. In an effort to address these issues, we investigated the value of parallel analysis of compartment-specific proteomes with an assessment of findings by cross-strategy and cross-omics (proteomics-transcriptomics) agreement. The validity of the individual datasets and of a ā€œverifiedā€ dataset based on crossstrategy/omics agreement was defined following their comparison with published literature. The proteomic analysis of the cell extract, Endoplasmic Reticulum/Golgi apparatus and conditioned medium of T24 vs. its metastatic subclone T24M bladder cancer cells allowed the identification of 253, 217 and 256 significant changes, respectively. Integration of these findings with transcriptomics resulted in 253 ā€œverifiedā€ proteins based on the agreement of at least 2 strategies. This approach revealed findings of higher validity, as supported by a higher level of agreement in the literature data than those of individual datasets. As an example, the coverage and shortlisting of targets in the IL-8 signalling pathway are discussed. Collectively, an integrative analysis appears a safer way to evaluate -omics datasets and ultimately generate models from valid observations

    Computational Framework for Data-Independent Acquisition Proteomics.

    Full text link
    Mass spectrometry (MS) is one of the main techniques for high throughput discovery- and targeted-based proteomics experiments. The most popular method for MS data acquisition has been data dependent acquisition (DDA) strategy which primarily selects high abundance peptides for MS/MS sequencing. DDA incorporates stochastic data acquisitions to avoid repetitive sequencing of same peptide, resulting in relatively irreproducible results for low abundance peptides between experiments. Data independent acquisition (DIA), in which peptide fragment signals are systematically acquired, is emerging as a promising alternative to address the DDA's stochasticity. DIA results in more complex signals, posing computational challenges for complex sample and high-throughput analysis. As a result, targeted extraction which requires pre-existing spectral libraries has been the most commonly used approach for automated DIA data analysis. However, building spectral libraries requires additional amount of analysis time and sample materials which are the major barriers for most research groups. In my dissertation, I develop a computational tool called DIA-Umpire, which includes computational and signal processing algorithms to enable untargeted DIA identification and quantification analysis without any prior spectral library. In the first study, a signal feature detection algorithm is developed to extract and assemble peptide precursor and fragment signals into pseudo MS/MS spectra which can be analyzed by the existing DDA untargeted analysis tools. This novel step enables direct and untargeted (spectral library-free) DIA identification analysis and we show the performance using complex samples including human cell lysate and glycoproteomics datasets. In the second study, a hybrid approach is developed to further improve the DIA quantification sensitivity and reproducibility. The performance of DIA-Umpire quantification approach is demonstrated using an affinity-purification mass spectrometry experiment for protein-protein interaction analysis. Lastly, in the third study, I improve the DIA-Umpire pipeline for data obtained from the Orbitrap family of mass spectrometers. Using public datasets, I show that the improved version of DIA-Umpire is capable of highly sensitive, untargeted analysis of DIA data for the data generated using Orbitrap family of mass spectrometers. The dissertation work addresses the barriers of DIA analysis and should facilitate the adoption of DIA strategy for a broad range of discovery proteomics applications.PhDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120699/1/tsouc_1.pd

    amsrpm: Robust Point Matching for Retention Time Aligment of LC/MS Data with R

    Get PDF
    Proteomics is the study of the abundance, function and dynamics of all proteins present in a living organism, and mass spectrometry (MS) has become its most important tool due to its unmatched sensitivity, resolution and potential for high-throughput experimentation. A frequently used variant of mass spectrometry is coupled with liquid chromatography (LC) and is denoted as "LC/MS". It produces two-dimensional raw data, where significant distortions along one of the dimensions can occur between different runs on the same instrument, and between instruments. A compensation of these distortions is required to allow for comparisons between and inference based on different experiments. This article introduces the amsrpm software package. It implements a variant of the Robust Point Matching (RPM) algorithm that is tailored for the alignment of LC and LC/MS experiments. Problem-specific enhancements include a specialized dissimilarity measure, and means to enforce smoothness and monotonicity of the estimated transformation function. The algorithm does not rely on pre-specified landmarks, it is insensitive towards outliers and capable of modeling nonlinear distortions. Its usefulness is demonstrated using both simulated and experimental data. The software is available as an open source package for the statistical programming language R

    Using a spike-in experiment to evaluate analysis of LC-MS data

    Get PDF
    BACKGROUND: Recent advances in liquid chromatography-mass spectrometry (LC-MS) technology have led to more effective approaches for measuring changes in peptide/protein abundances in biological samples. Label-free LC-MS methods have been used for extraction of quantitative information and for detection of differentially abundant peptides/proteins. However, difference detection by analysis of data derived from label-free LC-MS methods requires various preprocessing steps including filtering, baseline correction, peak detection, alignment, and normalization. Although several specialized tools have been developed to analyze LC-MS data, determining the most appropriate computational pipeline remains challenging partly due to lack of established gold standards. RESULTS: The work in this paper is an initial study to develop a simple model with "presence" or "absence" condition using spike-in experiments and to be able to identify these "true differences" using available software tools. In addition to the preprocessing pipelines, choosing appropriate statistical tests and determining critical values are important. We observe that individual statistical tests could lead to different results due to different assumptions and employed metrics. It is therefore preferable to incorporate several statistical tests for either exploration or confirmation purpose. CONCLUSIONS: The LC-MS data from our spike-in experiment can be used for developing and optimizing LC-MS data preprocessing algorithms and to evaluate workflows implemented in existing software tools. Our current work is a stepping stone towards optimizing LC-MS data acquisition and testing the accuracy and validity of computational tools for difference detection in future studies that will be focused on spiking peptides of diverse physicochemical properties in different concentrations to better represent biomarker discovery of differentially abundant peptides/proteins

    Optimized data processing algorithms for biomarker discovery by LC-MS

    Get PDF
    This thesis reports techniques and optimization of algorithms to analyse label-free LC-MS data sets for clinical proteomics studies with an emphasis on time alignment algorithms and feature selection methods. The presented work is intended to support ongoing medical and biomarker research. The thesis starts with a review of important steps in a data processing pipeline of label-free Liquid Chromatography ā€“ Mass Spectrometry (LC-MS) data. The first part of the thesis discusses an optimization strategy for aligning complex LC-MS chromatograms. It explains the combination of time alignment algorithms (Correlation Optimized Warping, Parametric Time Warping and Dynamic Time Warping) with a Component Detection Algorithm to overcome limitations of the original methods that use Total Ion Chromatograms when applied to highly complex data. A novel reference selection method to facilitate the pre-alignment process and an approach to globally compare the quality of time alignment using overlapping peak area are introduced and used in the study. The second part of this thesis highlights an ongoing challenge faced in the field of biomarker discovery where improvements in instrument resolution coupled with low sample numbers has led to a large discrepancy between the number of measurements and the number of measured variables. A comparative study of various commonly used feature selection methods for tackling this problem is presented. These methods are applied to spiked urine data sets with variable sample size and class separation to mimic typical conditions of biomarker research. Finally, the summary and the remaining challenges in the data processing field are summarized at the end of this thesis.
    • ā€¦
    corecore