2,245 research outputs found

    A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

    Full text link
    The tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged prefix and suffix subsequences, and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G=(V,E) where |V|=2k+2, we can solve this problem in O(|V|+|E|) time and O(|V|) space using dynamic programming. Our approach can be further used to discover a modified amino acid in O(|V||E|) time and to analyze data with other types of noise in O(|V||E|) time. Our algorithms have been implemented and tested on actual experimental data.Comment: A preliminary version appeared in Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 389--398, 200

    De novo sequencing of MS/MS spectra

    Get PDF
    Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field. © 2011 Expert Reviews Ltd.The Turkish Academy of Science (TÜBA

    Exploiting fragment-ion complementarity for peptide de novo sequencing from collision induced dissociation tandem mass spectra

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Molecular Biology and Genetics, Izmir, 2011Includes bibliographical references (leaves: 58-64)Text in English; Abstract: Turkish and Englishx, 64 leavesPeptide identification from mass spectrometric data is a key step in proteomics because this field provides sequence, quantitative, and modification data of actually expressed proteins. Two approaches are generally deployed to interpret experimental MS/MS data, database searching and de novo sequencing. Database search method has been used successfully in proteomics projects for organisms with well-studied genomes. However, it is not applicable in situations where a target sequence is not in the protein database. This can happen for a number of reasons, including novel proteins, protein mutations and post-translational modifications. Because of the disadvantages of database searching method, a lot of research has focused on de novo sequencing method which assigns amino acid sequences to MS/MS spectra without the need for a database. The aim of this study is to enhance the accuracy of de novo sequencing tools. One step commonly employed in all de novo sequencing tools is naming of fragment ions. It is essential to know which peak represents which ion type in order to traverse a spectrum graph to find an amino acid sequence that best explains the MS/MS spectrum. Different approaches have been tried to name ions and some success has been achieved in naming b-type ions and y-type ions. We have presented a new approach which enables the naming of not only b- and y-type ions but other arbitrary ion types as well. This enabled the detection of b-ion ladder. In the latter case, missing fragments were determined by using other named ion types. Furthermore, unexplained data in tandem mass spectra were reduced as much as possible. Therefore, a complete sequence will be derived by the new approach

    Ray: A profile-based approach for homology matching of tandem-ms spectra to sequence databases

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Biotechnology, Izmir, 2012Includes bibliographical references (leaves 46-50)Text in English; Abstract: Turkish and Englishxii, 50 leavesMass spectrometry is a tool that is commonly used in proteomics to identify and quantify proteins. Thousands of spectra can be obtained in just few hours. Computational methods enable the analysis of high-throughput studies. There are mainly two strategies: database search and de novo sequencing. Most of the researchers prefer database search as a first choice but any slight changes on protein can prevent identification. In such cases, de novo sequencing can be used. However, this approach highly depends on spectral quality and it is difficult to achieve predictions with full length sequence. Peptide sequence tags (PST) allows some flexibility on database searches. A PST is a short amino acid sequence with certain mass information but obtaining accurate PST is still arduous. In case a sequence is missing in database, homology searches can be useful. There are some homology search algorithms such as MS-BLAST, MS-Shotgun, FASTS. But, they are altered versions of existing algorithms, for example BLAST has been modified for mass spectrometric data and became MS-BLAST. Besides, they are usually coupled with de novo sequencing which still possess limitations. Therefore, there is a need for novel algorithms in order to increase the scope of homology searches. For this purpose, a novel approach that is based on sequence profiles has been implemented. A sequence profile is like a table that contains frequencies of all possible amino acids on a given MS/MS spectrum. Then, they are aligned to sequences in database. Profiles are more specific than PSTs and the requirement for precursor mass restrictions or enzyme information can be removed

    De novo sequencing of proteins by mass spectrometry

    Get PDF
    Introduction Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. Areas covered De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. Expert opinion As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.publishe

    Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem

    Full text link
    Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper we present Antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. Antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen's k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a dataset of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that Antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of run time and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. Antilope will be freely available as part of the open source proteomics library OpenMS
    corecore