1,105 research outputs found

    Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem

    Full text link
    Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper we present Antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. Antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen's k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a dataset of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that Antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of run time and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. Antilope will be freely available as part of the open source proteomics library OpenMS

    Exploiting fragment-ion complementarity for peptide de novo sequencing from collision induced dissociation tandem mass spectra

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Molecular Biology and Genetics, Izmir, 2011Includes bibliographical references (leaves: 58-64)Text in English; Abstract: Turkish and Englishx, 64 leavesPeptide identification from mass spectrometric data is a key step in proteomics because this field provides sequence, quantitative, and modification data of actually expressed proteins. Two approaches are generally deployed to interpret experimental MS/MS data, database searching and de novo sequencing. Database search method has been used successfully in proteomics projects for organisms with well-studied genomes. However, it is not applicable in situations where a target sequence is not in the protein database. This can happen for a number of reasons, including novel proteins, protein mutations and post-translational modifications. Because of the disadvantages of database searching method, a lot of research has focused on de novo sequencing method which assigns amino acid sequences to MS/MS spectra without the need for a database. The aim of this study is to enhance the accuracy of de novo sequencing tools. One step commonly employed in all de novo sequencing tools is naming of fragment ions. It is essential to know which peak represents which ion type in order to traverse a spectrum graph to find an amino acid sequence that best explains the MS/MS spectrum. Different approaches have been tried to name ions and some success has been achieved in naming b-type ions and y-type ions. We have presented a new approach which enables the naming of not only b- and y-type ions but other arbitrary ion types as well. This enabled the detection of b-ion ladder. In the latter case, missing fragments were determined by using other named ion types. Furthermore, unexplained data in tandem mass spectra were reduced as much as possible. Therefore, a complete sequence will be derived by the new approach

    De novo sequencing of heparan sulfate saccharides using high-resolution tandem mass spectrometry

    Get PDF
    Heparan sulfate (HS) is a class of linear, sulfated polysaccharides located on cell surface, secretory granules, and in extracellular matrices found in all animal organ systems. It consists of alternately repeating disaccharide units, expressed in animal species ranging from hydra to higher vertebrates including humans. HS binds and mediates the biological activities of over 300 proteins, including growth factors, enzymes, chemokines, cytokines, adhesion and structural proteins, lipoproteins and amyloid proteins. The binding events largely depend on the fine structure - the arrangement of sulfate groups and other variations - on HS chains. With the activated electron dissociation (ExD) high-resolution tandem mass spectrometry technique, researchers acquire rich structural information about the HS molecule. Using this technique, covalent bonds of the HS oligosaccharide ions are dissociated in the mass spectrometer. However, this information is complex, owing to the large number of product ions, and contains a degree of ambiguity due to the overlapping of product ion masses and lability of sulfate groups; as a result, there is a serious barrier to manual interpretation of the spectra. The interpretation of such data creates a serious bottleneck to the understanding of the biological roles of HS. In order to solve this problem, I designed HS-SEQ - the first HS sequencing algorithm using high-resolution tandem mass spectrometry. HS-SEQ allows rapid and confident sequencing of HS chains from millions of candidate structures and I validated its performance using multiple known pure standards. In many cases, HS oligosaccharides exist as mixtures of sulfation positional isomers. I therefore designed MULTI-HS-SEQ, an extended version of HS-SEQ targeting spectra coming from more than one HS sequence. I also developed several pre-processing and post-processing modules to support the automatic identification of HS structure. These methods and tools demonstrated the capacity for large-scale HS sequencing, which should contribute to clarifying the rich information encoded by HS chains as well as developing tailored HS drugs to target a wide spectrum of diseases

    QuasiNovo: Algorithms for De Novo Peptide Sequencing

    Get PDF
    High-throughput proteomics analysis involves the rapid identification and characterization of large sets of proteins in complex biological samples. Tandem mass spectrometry (MS/MS) has become the leading approach for the experimental identification of proteins. Accurate analysis of the data produced is a computationally challenging process that relies on a complex understanding of molecular dynamics, signal processing, and pattern classification. In this work we address these modeling and classification problems, and introduce an additional data-driven evolutionary information source into the analysis pipeline. The particular problem being solved is peptide sequencing via MS/MS. The objective in solving this problem is to decipher the amino acid sequence of digested proteins (peptides) from the MS/MS spectra produced in a typical experimental protocol. Our approach sequences peptides using only the information contained in the experimental spectrum (de novo) and distributions of amino acid usage learned from large sets of protein sequence data. In this dissertation we pursue three main objectives: an ion classifier based on a neural network which selects informative ions from the spectrum, a peptide sequencer which uses dynamic programming and a scoring function to generate candidate peptide sequences, and a candidate peptide scoring function. Candidate peptide sequences are generated via a dynamic programming graph algorithm, and then scored using a combination of the neural network score, the amino acid usage score, and an edge frequency score. In addition to a complete de novo peptide sequencer, we also examine the use of amino acid usage models independently for reranking candidate peptides

    De novo sequencing of MS/MS spectra

    Get PDF
    Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field. © 2011 Expert Reviews Ltd.The Turkish Academy of Science (TÜBA

    De novo sequencing of proteins by mass spectrometry

    Get PDF
    Introduction Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. Areas covered De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. Expert opinion As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.publishe

    Applications of graph theory in protein structure identification

    Get PDF
    There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given

    PARPST: a PARallel algorithm to find peptide sequence tags

    Get PDF
    Background: Protein identification is one of the most challenging problems in proteomics. Tandem mass spectrometry provides an important tool to handle the protein identification problem. Results: We developed a work-efficient parallel algorithm for the peptide sequence tag problem. The algorithm runs on the concurrent-read, exclusive-write PRAM in O(n) time using log n processors, where n is the number of mass peaks in the spectrum. The algorithm is able to find all the sequence tags having score greater than a parameter or all the sequence tags of maximum length. Our tests on 1507 spectra in the Open Proteomics Database shown that our algorithm is efficient and effective since achieves comparable results to other methods. Conclusions: The proposed algorithm can be used to speed up the database searching or to identify post-translational modifications, comparing the homology of the sequence tags found with the sequences in the biological database
    corecore