167 research outputs found

    SearchGUI: a highly adaptable common interface for proteomics search and de novo engines

    Get PDF
    Mass-spectrometry-based proteomics has become the standard approach for identifying and quantifying proteins. A vital step consists of analyzing experimentally generated mass spectra to identify the underlying peptide sequences for later mapping to the originating proteins. We here present the latest developments in SearchGUI, a common open-source interface for the most frequently used freely available proteomics search and de novo engines that has evolved into a central component in numerous bioinformatics workflows.acceptedVersio

    Automated splitting into batches for observational biomedical studies with sequential processing

    Get PDF
    Experimental design usually focuses on the setting where treatments and/or other aspects of interest can be manipulated. However, in observational biomedical studies with sequential processing, the set of available samples is often fixed, and the problem is thus rather the ordering and allocation of samples to batches such that comparisons between different treatments can be made with similar precision. In certain situations, this allocation can be done by hand, but this rapidly becomes impractical with more challenging cohort setups. Here, we present a fast and intuitive algorithm to generate balanced allocations of samples to batches for any single-variable model where the treatment variable is nominal. This greatly simplifies the grouping of samples into batches, makes the process reproducible, and provides a marked improvement over completely random allocations. The general challenges of allocation and why good solutions can be hard to find are also discussed, as well as potential extensions to multivariable settings.publishedVersio

    Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics

    Get PDF
    With the advent of mass spectrometry based proteomics, the identification of thousands of proteins has become commonplace in biology nowadays. Increasingly, efforts have also been invested toward the detection and localization of posttranslational modifications. It is furthermore common practice to quantify the identified entities, a task supported by a panel of different methods. Finally, the results can also be enriched with functional knowledge gained on the proteins, detecting for instance differentially expressed gene ontology terms or biological pathways. In this study, we review the resources, methods and tools available for the researcher to achieve such a quantitative functional analysis. These include statistics for the post-processing of identification and quantification results, online resources and public repositories. With a focus on free but user-friendly software, preferably also open-source, we provide a list of tools designed to help the researcher manage the vast amount of data generated. We also indicate where such applications currently remain lacking. Moreover, we stress the eventual pitfalls of every step of such studies. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.acceptedVersio

    PeptideShaker Online: A User-Friendly Web-Based Framework for the Identification of Mass Spectrometry-Based Proteomics Data

    Get PDF
    Mass spectrometry-based proteomics is a high-throughput technology generating ever-larger amounts of data per project. However, storing, processing, and interpreting these data can be a challenge. A key element in simplifying this process is the development of interactive frameworks focusing on visualization that can greatly simplify both the interpretation of data and the generation of new knowledge. Here we present PeptideShaker Online, a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data, from raw file conversion to interactive visualization of the resulting data. Storage and processing of the data are performed via the versatile Galaxy platform (through SearchGUI, PeptideShaker, and moFF), while the interaction with the results happens via a locally installed web server, thus enabling researchers to process and interpret their own data without requiring advanced bioinformatics skills or direct access to compute-intensive infrastructures. The source code, additional documentation, and a fully functional demo is available at https://github.com/barsnes-group/peptide-shaker-online.publishedVersio

    Pladipus enables universal distributed computing in proteomics bioinformatics

    Get PDF
    The use of proteomics bioinformatics substantially contributes to an improved understanding of proteomes, but this novel and in-depth knowledge comes at the cost of increased computational complexity. Parallelization across multiple computers, a strategy termed distributed computing, can be used to handle this increased complexity; however, setting up and maintaining a distributed computing infrastructure requires resources and skills that are not readily available to most research groups. Here we propose a free and open -source framework named Pladipus that greatly facilitates the establishment of distributed computing networks for proteomics bioinformatics tools. Pladipus is straightforward to install and operate thanks to its user-friendly graphical interface, allowing complex bioinformatics tasks to be run easily on a network instead of a single computer. As a result, any researcher can benefit from the increased computational efficiency provided by distributed computing, hence empowering them to tackle more complex bioinformatics challenges. Notably, it enables any research group to perform large-scale reprocessing of publicly available proteomics data, thus supporting the scientific community in mining these data for novel discoveries

    Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides

    Get PDF
    Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.publishedVersio

    Anatomy and evolution of database search engines — a central component of mass spectrometry based proteomic workflows

    Get PDF
    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines.acceptedVersio

    Shedding light on black boxes in protein identification

    Get PDF
    Performing a well thought-out proteomics data analysis can be a daunting task, especially for newcomers to the field. Even researchers experienced in the proteomics field can find it challenging to follow existing publication guidelines for MS-based protein identification and characterization in detail. One of the primary goals of bioinformatics is to enable any researcher to interpret the vast amounts of data generated in modern biology, by providing user-friendly and robust end-user applications, clear documentation, and corresponding teaching materials. In that spirit, we here present an extensive tutorial for peptide and protein identification, available at http://compomics.com/bioinformatics-for-proteomics. The material is completely based on freely available and open-source tools, and has already been used and refined at numerous international courses over the past 3 years. During this time, it has demonstrated its ability to allow even complete beginners to intuitively conduct advanced bioinformatics workflows, interpret the results, and understand their context. This tutorial is thus aimed at fully empowering users, by removing black boxes in the proteomics informatics pipeline.acceptedVersio
    • …
    corecore