19,300 research outputs found

    Anatomy and evolution of database search engines ā€” a central component of mass spectrometry based proteomic workflows

    Get PDF
    Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines.acceptedVersio

    Protein Identication from Top-Down Mass Spectra, a Fast Filtering Algorithm

    Get PDF
    La spettrometria di massa top-down e' un campo abbastanza nuovo della proteomica che offre ottime prospettive. Le risorse informatiche, pero', non sono ancora mature. Nel presente lavoro viene sviluppato un veloce algoritmo che a partire da uno spettro di massa top-down deconvoluto e da un database, determina quali sono le proteine da cui con piu' probabilita' deriva lo spettro. Questo programma e' un efficace strumento di filtraggio, che puo' ridurre notevolmente il tempo di esecuzione per programmi per l'identificazione di protein

    Multiplexing in Multi-Reflecting TOF MS

    Get PDF
    The paper presents an overview of original inventions, development and experimental results by the group of authors in the area of multi-reflecting time-of-flight mass spectrometry with Folded Flight Path (FFPĀ®) (MR-TOFMS) with main focus on multiplexing methods for improving the analysis throughput, i.e. the amount of information per time unit. MR-TOF provides panoramic spectra (virtue of TOFMS), while significantly enhancing resolving power, thus, providing yet more information. Resolving power R=500,000 is demonstrated to resolve isobars and to improve mass accuracy to sub-ppm level. Encoded Frequent Pulsing (EFPTM) method improves sensitivity, expands dynamic range and opens multiple incarnations of parallel and fast tandem methods of analysis based on using ion traps, TOFMS and ion mobility for rapid and lossless parent ion separations

    Urinary CE-MS peptide marker pattern for detection of solid tumors

    Get PDF
    Urinary profiling datasets, previously acquired by capillary electrophoresis coupled to mass-spectrometry were investigated to identify a general urinary marker pattern for detection of solid tumors by targeting common systemic events associated with tumor-related inflammation. A total of 2,055 urinary profiles were analyzed, derived from a) a cancer group of patients (nā€‰=ā€‰969) with bladder, prostate, and pancreatic cancers, renal cell carcinoma, and cholangiocarcinoma and b) a control group of patients with benign diseases (nā€‰=ā€‰556), inflammatory diseases (nā€‰=ā€‰199) and healthy individuals (nā€‰=ā€‰331). Statistical analysis was conducted in a discovery set of 676 cancer cases and 744 controls. 193 peptides differing at statistically significant levels between cases and controls were selected and combined to a multi-dimensional marker pattern using support vector machine algorithms. Independent validation in a set of 635 patients (293 cancer cases and 342 controls) showed an AUC of 0.82. Inclusion of age as independent variable, significantly increased the AUC value to 0.85. Among the identified peptides were mucins, fibrinogen and collagen fragments. Further studies are planned to assess the pattern value to monitor patients for tumor recurrence. In this proof-of-concept study, a general tumor marker pattern was developed to detect cancer based on shared biomarkers, likely indicative of cancer-related features

    Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset.</p> <p>Results</p> <p>We present Harvest: an open source software tool for analysing fragmentation patterns and assessing the power of a new piece of information about the MS/MS fragmentation process to more clearly differentiate between correct and random peptide assignments. We demonstrate this functionality using data metrics derived from the properties of individual datasets in a peptide identification context. Using Harvest, we demonstrate how the development of such metrics may improve correct peptide assignment confidence in the context of a high-throughput proteomics experiment and characterise properties of peptide fragmentation.</p> <p>Conclusions</p> <p>Harvest provides a simple framework in C++ for analysing and prototyping metrics for peptide matching, the core of the protein identification problem. It is not a protein identification package and answers a different research question to packages such as Sequest, Mascot, X!Tandem, and other protein identification packages. It does not aim to maximise the number of assigned peptides from a set of unknown spectra, but instead provides a method by which researchers can explore fragmentation properties and assess the power of novel metrics for peptide matching in the context of a given experiment. Metrics developed using Harvest may then become candidates for later integration into protein identification packages.</p

    Computational Methods for Protein Identification from Mass Spectrometry Data

    Get PDF
    Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology

    Computational Framework for Data-Independent Acquisition Proteomics.

    Full text link
    Mass spectrometry (MS) is one of the main techniques for high throughput discovery- and targeted-based proteomics experiments. The most popular method for MS data acquisition has been data dependent acquisition (DDA) strategy which primarily selects high abundance peptides for MS/MS sequencing. DDA incorporates stochastic data acquisitions to avoid repetitive sequencing of same peptide, resulting in relatively irreproducible results for low abundance peptides between experiments. Data independent acquisition (DIA), in which peptide fragment signals are systematically acquired, is emerging as a promising alternative to address the DDA's stochasticity. DIA results in more complex signals, posing computational challenges for complex sample and high-throughput analysis. As a result, targeted extraction which requires pre-existing spectral libraries has been the most commonly used approach for automated DIA data analysis. However, building spectral libraries requires additional amount of analysis time and sample materials which are the major barriers for most research groups. In my dissertation, I develop a computational tool called DIA-Umpire, which includes computational and signal processing algorithms to enable untargeted DIA identification and quantification analysis without any prior spectral library. In the first study, a signal feature detection algorithm is developed to extract and assemble peptide precursor and fragment signals into pseudo MS/MS spectra which can be analyzed by the existing DDA untargeted analysis tools. This novel step enables direct and untargeted (spectral library-free) DIA identification analysis and we show the performance using complex samples including human cell lysate and glycoproteomics datasets. In the second study, a hybrid approach is developed to further improve the DIA quantification sensitivity and reproducibility. The performance of DIA-Umpire quantification approach is demonstrated using an affinity-purification mass spectrometry experiment for protein-protein interaction analysis. Lastly, in the third study, I improve the DIA-Umpire pipeline for data obtained from the Orbitrap family of mass spectrometers. Using public datasets, I show that the improved version of DIA-Umpire is capable of highly sensitive, untargeted analysis of DIA data for the data generated using Orbitrap family of mass spectrometers. The dissertation work addresses the barriers of DIA analysis and should facilitate the adoption of DIA strategy for a broad range of discovery proteomics applications.PhDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120699/1/tsouc_1.pd

    DEVELOPMENT AND APPLICATION OF MASS SPECTROMETRY-BASED PROTEOMICS TO GENERATE AND NAVIGATE THE PROTEOMES OF THE GENUS POPULUS

    Get PDF
    Historically, there has been tremendous synergy between biology and analytical technology, such that one drives the development of the other. Over the past two decades, their interrelatedness has catalyzed entirely new experimental approaches and unlocked new types of biological questions, as exemplified by the advancements of the field of mass spectrometry (MS)-based proteomics. MS-based proteomics, which provides a more complete measurement of all the proteins in a cell, has revolutionized a variety of scientific fields, ranging from characterizing proteins expressed by a microorganism to tracking cancer-related biomarkers. Though MS technology has advanced significantly, the analysis of complicated proteomes, such as plants or humans, remains challenging because of the incongruity between the complexity of the biological samples and the analytical techniques available. In this dissertation, analytical methods utilizing state-of-the-art MS instrumentation have been developed to address challenges associated with both qualitative and quantitative characterization of eukaryotic organisms. In particular, these efforts focus on characterizing Populus, a model organism and potential feedstock for bioenergy. The effectiveness of pre-existing MS techniques, initially developed to identify proteins reliably in microbial proteomes, were tested to define the boundaries and characterize the landscape of functional genome expression in Populus. Although these approaches were generally successful, achieving maximal proteome coverage was still limited by a number of factors, including genome complexity, the dynamic range of protein identification, and the abundance of protein variants. To overcome these challenges, improvements were needed in sample preparation, MS instrumentation, and bioinformatics. Optimization of experimental procedures and implementation of current state-of-the-art instrumentation afforded the most detailed look into the predicted proteome space of Populus, offering varying proteome perspectives: 1) network-wide, 2) pathway-specific, and 3) protein-level viewpoints. In addition, we implemented two bioinformatic approaches that were capable of decoding the plasticity of the Populus proteome, facilitating the identification of single amino acid polymorphisms and generating a more accurate profile of protein expression. Though the methods and results presented in this dissertation have direct implications in the study of bioenergy research, more broadly this dissertation focuses on developing techniques to contend with the notorious challenges associated with protein characterization in all eukaryotic organisms
    • ā€¦
    corecore