63 research outputs found

    Pipelines and Systems for Threshold-Avoiding Quantification of LC-MS/MS Data

    Get PDF
    The accurate processing of complex liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data from biological samples is a major challenge for metabolomics, proteomics, and related approaches. Here, we present the pipelines and systems for threshold-avoiding quantification (PASTAQ) LC-MS/MS preprocessing toolset, which allows highly accurate quantification of data-dependent acquisition LC-MS/MS datasets. PASTAQ performs compound quantification using single-stage (MS1) data and implements novel algorithms for high-performance and accurate quantification, retention time alignment, feature detection, and linking annotations from multiple identification engines. PASTAQ offers straightforward parameterization and automatic generation of quality control plots for data and preprocessing assessment. This design results in smaller variance when analyzing replicates of proteomes mixed with known ratios and allows the detection of peptides over a larger dynamic concentration range compared to widely used proteomics preprocessing tools. The performance of the pipeline is also demonstrated in a biological human serum dataset for the identification of gender-related proteins.</p

    Bioinformatics and Statistics:LC-MS(/MS) Data Preprocessing for Biomarker Discovery

    No full text
    This chapter provides an overview of the main steps of LC-MS(/MS) data pre-processing workflows. It discusses the main characteristics of these steps and provides a detailed functional description of the currently available algorithmic approaches. As an example, the chapter presents the main steps of the Threshold Avoiding Proteomics Pipeline, which includes several novel concepts to increase the accuracy of peptide quantification and to increase the extracted dynamic concentration range of compounds. The chapter further outlines a quality control method to assess and compare the relative performance of various LC-MS(/MS) data pre-processing workflows integrated in the msComapre framework using a set of differentially spiked LC-MS datasets. The chapter discusses the most common quantitative data pre-processing errors and provides visualization methods to identify these errors. Finally the chapter provides an overview of future development trends of LC-MS(/MS) data pre-processing algorithm development stressing the need for easy-to-use high-throughput bioinformatics platforms using modern parallel computational resources to alleviate current data pre-processing and analysis bottlenecks

    Threshold-Avoiding Proteomics Pipeline

    No full text
    We present a new proteomics analysis pipeline focused on maximizing the dynamic range of detected molecules in liquid chromatography-mass spectrometry (LC-MS) data and accurately quantifying low-abundance peaks to identify those with biological relevance. Although there has been much work to improve the quality of data derived from LC-MS instruments, the goal of this study was to extend the dynamic range of analyzed compounds by making full use of the information available within each data set and across multiple related chromatograms in an experiment. Our aim was to distinguish low-abundance signal peaks from noise by noting their coherent behavior across multiple data sets, and central to this is the need to delay the culling of noise peaks until the final peak-matching stage of the pipeline, when peaks from a single sample appear in the context of all others. The application of thresholds that might discard signal peaks early is thereby avoided, hence the name TAPP: threshold-avoiding proteomics pipeline. TAPP focuses on quantitative low-level processing of raw LC-MS data and includes novel preprocessing, peak detection, time alignment, and cluster-based matching. We demonstrate the performance of TAPP on biologically relevant sample data consisting of porcine cerebrospinal fluid spiked over a wide range of concentrations with horse heart cytochrome c

    Two-dimensional method for time aligning liquid chromatography-mass spectrometry data

    No full text
    We describe a new time alignment method that takes advantage of both dimensions of LC-MS data to resolve ambiguities in peak matching while remaining computationally efficient. This approach, Warp2D, combines peak extraction with a two-dimensional correlation function to provide a reliable alignment scoring function that is insensitive to spurious peaks and background noise. One-dimensional alignment methods are often based on the total-ion-current elution profile of the spectrum and are unable to distinguish peaks of different masses. Our approach uses one-dimensional alignment in time, but with a scoring function derived from the overlap of peaks in two dimensions, thereby combining the specificity of two-dimensional methods with the computational performance of one-dimensional methods. The peaks are approximated as two-dimensional Gaussians of varying width. This approximation allows peak overlap (the measure of alignment quality) to be calculated analytically, without computationally intensive numerical integration in two dimensions. To demonstrate the general applicability of Warp2D, we chose a variety of complex samples that have substantial biological and analytical variability, including human serum and urine. We show that Warp2D works well with these diverse sample sets and with minimal tuning of parameters, based on the reduced standard deviation of peak elution times after warping. The combination of high computational speed, robustness with complex samples, and lack of need for detailed tuning makes this alignment method well suited to high-throughput LC-MS studies

    Correlation Queries for Mass Spectrometry Imaging

    No full text
    <p>Mass spectrometry imaging (MSI) generates large volumetric data sets consisting of mass to charge ratio (m/z), ion current, and x,y coordinate location. These data sets usually serve limited purposes centered on measuring the distribution of a small set of ions with known m/z. Such earmarked queries consider only a fraction of the full mass spectrum captured, and there are few tools to assist the exploration of the remaining volume of unknown data in terms of demonstrating similarity or discordance in tissue compartment distribution patterns. Here we present a novel, interactive approach to extract information from MSI data that relies on precalculated data structures to perform queries of large data sets with a typical laptop. We have devised methods to query the full volume to find new m/z values of potential interest based on similarity to biological structures or to the spatial distribution of known ions. We describe these query methods in detail and provide examples demonstrating the power of the methods to "discover" m/z values of ions that have such potentially interesting correlations. The "discovered" ions may be further correlated with either positional locations or the coincident distribution of other ions using successive queries. Finally, we show it is possible to gain insight to the fragmentation pattern of the parent molecule from such correlations. The ability to discover new ions of interest in the unknown bulk of an MSI data set offers the potential to further our understanding of biological and physiological processes related to health and disease.</p>
    • …
    corecore