49 research outputs found

    Efficient visualization of high-throughput targeted proteomics experiments: TAPIR

    Get PDF
    Motivation: Targeted mass spectrometry comprises a set of powerful methods to obtain accurate and consistent protein quantification in complex samples. To fully exploit these techniques, a cross-platform and open-source software stack based on standardized data exchange formats is required. Results: We present TAPIR, a fast and efficient Python visualization software for chromatograms and peaks identified in targeted proteomics experiments. The input formats are open, community-driven standardized data formats (mzML for raw data storage and TraML encoding the hierarchical relationships between transitions, peptides and proteins). TAPIR is scalable to proteome-wide targeted proteomics studies (as enabled by SWATH-MS), allowing researchers to visualize high-throughput datasets. The framework integrates well with existing automated analysis pipelines and can be extended beyond targeted proteomics to other types of analyses. Availability and implementation: TAPIR is available for all computing platforms under the 3-clause BSD license at https://github.com/msproteomicstools/msproteomicstools. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    aLFQ: an R-package for estimating absolute protein quantities from label-free LC-MS/MS proteomics data

    Get PDF
    Motivation: The determination of absolute quantities of proteins in biological samples is necessary for multiple types of scientific inquiry. While relative quantification has been commonly used in proteomics, few proteomic datasets measuring absolute protein quantities have been reported to date. Various technologies have been applied using different types of input data, e.g. ion intensities or spectral counts, as well as different absolute normalization strategies. To date, a user-friendly and transparent software supporting large-scale absolute protein quantification has been lacking. Results: We present a bioinformatics tool, termed aLFQ, which supports the commonly used absolute label-free protein abundance estimation methods (TopN, iBAQ, APEX, NSAF and SCAMPI) for LC-MS/MS proteomics data, together with validation algorithms enabling automated data analysis and error estimation. Availability and implementation: aLFQ is written in R and freely available under the GPLv3 from CRAN (http://www.cran.r-project.org). Instructions and example data are provided in the R-package. The raw data can be obtained from the PeptideAtlas raw data repository (PASS00321). Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    DIANA—algorithmic improvements for analysis of data-independent acquisition MS data

    Get PDF
    Motivation: Data independent acquisition mass spectrometry has emerged as a reproducible and sensitive alternative in quantitative proteomics, where parsing the highly complex tandem mass spectra requires dedicated algorithms. Recently, targeted data extraction was proposed as a novel analysis strategy for this type of data, but it is important to further develop these concepts to provide quality-controlled, interference-adjusted and sensitive peptide quantification. Results: We here present the algorithm DIANA and the classifier PyProphet, which are based on new probabilistic sub-scores to classify the chromatographic peaks in targeted data-independent acquisition data analysis. The algorithm is capable of providing accurate quantitative values and increased recall at a controlled false discovery rate, in a complex gold standard dataset. Importantly, we further demonstrate increased confidence gained by the use of two complementary data-independent acquisition targeted analysis algorithms, as well as increased numbers of quantified peptide precursors in complex biological samples. Availability and implementation: DIANA is implemented in scala and python and available as open source (Apache 2.0 license) or pre-compiled binaries from http://quantitativeproteomics.org/diana. PyProphet can be installed from PyPi (https://pypi.python.org/pypi/pyprophet). Supplementary information: Supplementary data are available at Bioinformatics onlin

    BioContainers: An open-source and community-driven framework for software standardization

    Get PDF
    Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/.publishedVersio

    Reproducible quantitative proteotype data matrices for systems biology

    No full text
    Historically, many mass spectrometry–based proteomic studies have aimed at compiling an inventory of protein compounds present in a biological sample, with the long-term objective of creating a proteome map of a species. However, to answer fundamental questions about the behavior of biological systems at the protein level, accurate and unbiased quantitative data are required in addition to a list of all protein components. Fueled by advances in mass spectrometry, the proteomics field has thus recently shifted focus toward the reproducible quantification of proteins across a large number of biological samples. This provides the foundation to move away from pure enumeration of identified proteins toward quantitative matrices of many proteins measured across multiple samples. It is argued here that data matrices consisting of highly reproducible, quantitative, and unbiased proteomic measurements across a high number of conditions, referred to here as quantitative proteotype maps, will become the fundamental currency in the field and provide the starting point for downstream biological analysis. Such proteotype data matrices, for example, are generated by the measurement of large patient cohorts, time series, or multiple experimental perturbations. They are expected to have a large effect on systems biology and personalized medicine approaches that investigate the dynamic behavior of biological systems across multiple perturbations, time points, and individuals.ISSN:1939-4586ISSN:1059-152

    Achieving quantitative reproducibility in label-free multisite DIA experiments through multirun alignment

    No full text
    Abstract DIA is a mainstream method for quantitative proteomics, but consistent quantification across multiple LC-MS/MS instruments remains a bottleneck in parallelizing data acquisition. One reason for this inconsistency and missing quantification is the retention time shift which current software does not adequately address for runs from multiple sites. We present multirun chromatogram alignment strategies to map peaks across columns, including the traditional reference-based Star method, and two novel approaches: MST and Progressive alignment. These reference-free strategies produce a quantitatively accurate data-matrix, even from heterogeneous multi-column studies. Progressive alignment also generates merged chromatograms from all runs which has not been previously achieved for LC-MS/MS data. First, we demonstrate the effectiveness of multirun alignment strategies on a gold-standard annotated dataset, resulting in a threefold reduction in quantitation error-rate compared to non-aligned DIA results. Subsequently, on a multi-species dataset that DIAlignR effectively controls the quantitative error rate, improves precision in protein measurements, and exhibits conservative peak alignment. We next show that the MST alignment reduces cross-site CV by 50% for highly abundant proteins when applied to a dataset from 11 different LC-MS/MS setups. Finally, the reanalysis of 949 plasma runs with multirun alignment revealed a more than 50% increase in insulin resistance (IR) and respiratory viral infection (RVI) proteins, identifying 11 and 13 proteins respectively, compared to prior analysis without it. The three strategies are implemented in our DIAlignR workflow (>2.3) and can be combined with linear, non-linear, or hybrid pairwise alignment

    Efficient visualization of high-throughput targeted proteomics experiments: TAPIR

    Full text link
    Motivation: Targeted mass spectrometry comprises a set of powerful methods to obtain accurate and consistent protein quantification in complex samples. To fully exploit these techniques, a cross-platform and open-source software stack based on standardized data exchange formats is required. Results: We present TAPIR, a fast and efficient Python visualization software for chromatograms and peaks identified in targeted proteomics experiments. The input formats are open, community-driven standardized data formats (mzML for raw data storage and TraML encoding the hierarchical relationships between transitions, peptides and proteins). TAPIR is scalable to proteome-wide targeted proteomics studies (as enabled by SWATH-MS), allowing researchers to visualize high-throughput datasets. The framework integrates well with existing automated analysis pipelines and can be extended beyond targeted proteomics to other types of analyses. Availability and implementation: TAPIR is available for all computing platforms under the 3-clause BSD license at https://github.com/msproteomicstools/msproteomicstools. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
    corecore