63 research outputs found

    Pipelines and Systems for Threshold-Avoiding Quantification of LC-MS/MS Data

    Get PDF
    The accurate processing of complex liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data from biological samples is a major challenge for metabolomics, proteomics, and related approaches. Here, we present the pipelines and systems for threshold-avoiding quantification (PASTAQ) LC-MS/MS preprocessing toolset, which allows highly accurate quantification of data-dependent acquisition LC-MS/MS datasets. PASTAQ performs compound quantification using single-stage (MS1) data and implements novel algorithms for high-performance and accurate quantification, retention time alignment, feature detection, and linking annotations from multiple identification engines. PASTAQ offers straightforward parameterization and automatic generation of quality control plots for data and preprocessing assessment. This design results in smaller variance when analyzing replicates of proteomes mixed with known ratios and allows the detection of peptides over a larger dynamic concentration range compared to widely used proteomics preprocessing tools. The performance of the pipeline is also demonstrated in a biological human serum dataset for the identification of gender-related proteins.</p

    Bioinformatics and Statistics:LC-MS(/MS) Data Preprocessing for Biomarker Discovery

    No full text
    This chapter provides an overview of the main steps of LC-MS(/MS) data pre-processing workflows. It discusses the main characteristics of these steps and provides a detailed functional description of the currently available algorithmic approaches. As an example, the chapter presents the main steps of the Threshold Avoiding Proteomics Pipeline, which includes several novel concepts to increase the accuracy of peptide quantification and to increase the extracted dynamic concentration range of compounds. The chapter further outlines a quality control method to assess and compare the relative performance of various LC-MS(/MS) data pre-processing workflows integrated in the msComapre framework using a set of differentially spiked LC-MS datasets. The chapter discusses the most common quantitative data pre-processing errors and provides visualization methods to identify these errors. Finally the chapter provides an overview of future development trends of LC-MS(/MS) data pre-processing algorithm development stressing the need for easy-to-use high-throughput bioinformatics platforms using modern parallel computational resources to alleviate current data pre-processing and analysis bottlenecks

    Threshold-Avoiding Proteomics Pipeline

    No full text
    We present a new proteomics analysis pipeline focused on maximizing the dynamic range of detected molecules in liquid chromatography-mass spectrometry (LC-MS) data and accurately quantifying low-abundance peaks to identify those with biological relevance. Although there has been much work to improve the quality of data derived from LC-MS instruments, the goal of this study was to extend the dynamic range of analyzed compounds by making full use of the information available within each data set and across multiple related chromatograms in an experiment. Our aim was to distinguish low-abundance signal peaks from noise by noting their coherent behavior across multiple data sets, and central to this is the need to delay the culling of noise peaks until the final peak-matching stage of the pipeline, when peaks from a single sample appear in the context of all others. The application of thresholds that might discard signal peaks early is thereby avoided, hence the name TAPP: threshold-avoiding proteomics pipeline. TAPP focuses on quantitative low-level processing of raw LC-MS data and includes novel preprocessing, peak detection, time alignment, and cluster-based matching. We demonstrate the performance of TAPP on biologically relevant sample data consisting of porcine cerebrospinal fluid spiked over a wide range of concentrations with horse heart cytochrome c

    Two-dimensional method for time aligning liquid chromatography-mass spectrometry data

    No full text
    We describe a new time alignment method that takes advantage of both dimensions of LC-MS data to resolve ambiguities in peak matching while remaining computationally efficient. This approach, Warp2D, combines peak extraction with a two-dimensional correlation function to provide a reliable alignment scoring function that is insensitive to spurious peaks and background noise. One-dimensional alignment methods are often based on the total-ion-current elution profile of the spectrum and are unable to distinguish peaks of different masses. Our approach uses one-dimensional alignment in time, but with a scoring function derived from the overlap of peaks in two dimensions, thereby combining the specificity of two-dimensional methods with the computational performance of one-dimensional methods. The peaks are approximated as two-dimensional Gaussians of varying width. This approximation allows peak overlap (the measure of alignment quality) to be calculated analytically, without computationally intensive numerical integration in two dimensions. To demonstrate the general applicability of Warp2D, we chose a variety of complex samples that have substantial biological and analytical variability, including human serum and urine. We show that Warp2D works well with these diverse sample sets and with minimal tuning of parameters, based on the reduced standard deviation of peak elution times after warping. The combination of high computational speed, robustness with complex samples, and lack of need for detailed tuning makes this alignment method well suited to high-throughput LC-MS studies

    Inversion of peak elution order prevents uniform time alignment of complex liquid-chromatography coupled to mass spectrometry datasets

    No full text
    Retention time alignment is one of the most challenging steps in processing LC-MS datasets of complex proteomics samples acquired within a differential profiling study. A large number of time alignment methods have been developed for accurate pre-processing of such datasets. These methods generally assume that common compounds elute in the same order but they do not test whether this assumption holds. If this assumption is not valid, alignments based on a monotonic retention time function will lose accuracy for peaks that depart from the expected order of the retention time correspondence function. To address this issue, we propose a quality control method that assesses if a pair of complex LC-MS datasets can be aligned with the same alignment performance based on statistical tests before correcting retention time shifts. The algorithm first confirms the presence of an adequate number of common peaks (> approximately 100 accurately matched peak pairs), then determines if the probability for a conserved elution order of those common peaks is sufficiently high (>0.01) and finally performs retention time alignment of two LC-MS chromatograms. This procedure was applied to LC-MS and LC-MS/MS datasets from two different inter-laboratory proteomics studies showing that a large number of common peaks in chromatograms acquired by different laboratories change elution order with considerable retention time differences
    • …
    corecore