4,646 research outputs found

    Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem

    Full text link
    Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper we present Antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. Antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen's k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a dataset of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that Antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of run time and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. Antilope will be freely available as part of the open source proteomics library OpenMS

    De novo sequencing of MS/MS spectra

    Get PDF
    Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field. © 2011 Expert Reviews Ltd.The Turkish Academy of Science (TÜBA

    N-terminal Imine Derivatization for Enhanced De Novo Peptide Sequencing: A Study of the Fragmentation Pattern Generated from CID of Peptide-Imines

    Get PDF
    In this work, the fragmentation pattern derived from model peptides derivatized to create N-terminal imines (Schiff bases) were evaluated. Collision-induced dissociation of the protonated peptide-imines ([M+H]+) generally produced complete series of the sequence informative an and bn ions, now undoubtedly characteristic of the peptide ion species. A novel product ion was also observed, denoted the yǂ ion, determined by IRMPD spectroscopy and density functional theory to be generated from the elimination of the N-terminal amino acid residue despite the N-terminal modification. It was concluded the pathway involved a nucleophilic attack by an amide nitrogen and the possible formation of an imidazole-4-one intermediate, which collapses to generate a truncated, protonated peptide-imine with a conserved primary sequence. N-terminal imine-modification was also observed to eliminate sequence scrambling events, presumably by eliminating the macrocyclic b ion mechanism implicated in the sequence rearrangements. Additionally, the CID mass spectra of Ag-cationized imine-modified peptides were obtained. An apparent even-electron, [M-H]+ peptide ion was observed, determined to be generated by the loss of AgH. The hydrogen abstraction was explicitly identified to originate from the imine-carbon of the argentinated modified peptide. CID of the [M–H]+ ions generated sequence ions analogous to those produced from the [M+H]+ species of imine-modified peptides, however less extensively

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    Algorithms for Glycan Structure Identification with Tandem Mass Spectrometry

    Get PDF
    Glycosylation is a frequently observed post-translational modification (PTM) of proteins. It has been estimated over half of eukaryotic proteins in nature are glycoproteins. Glycoprotein analysis plays a vital role in drug preparation. Thus, characterization of glycans that are linked to proteins has become necessary in glycoproteomics. Mass spectrometry has become an effective analytical technique for glycoproteomics analysis because of its high throughput and sensitivity. The large amount of spectral data collected in a mass spectrometry experiment makes manual interpretation impossible and requires effective computational approaches for automated analysis. Different algorithmic solutions have been proposed to address the challenges in glycoproteomics analysis based on mass spectrometry. However, new algorithms that can identify intact glycopeptides are still demanded to improve result accuracy. In this research, a glycan is represented as a rooted unordered labelled tree and we focus on developing effective algorithms to determine glycan structures from tandem mass spectra. Interpreting the tandem mass spectra of glycopeptides with a de novo sequencing method is essential to identifying novel glycan structures. Thus, we mathematically formulated the glycan de novo sequencing problem and propose a heuristic algorithm for glycan de novo sequencing from HCD tandem mass spectra of glycopeptides. Characterizing glycans from MS/MS with a de novo sequencing method requires high-quality mass spectra for accurate results. The database search method usually has the ability to obtain more reliable results since it has the assistance of glycan structural information. Thus, we propose a de novo sequencing assisted database search method, GlycoNovoDB, for mass spectra interpretation

    Development of new approaches for the synthesis and decoding of one-bead one-compound cyclic peptide libraries

    Get PDF
    La plupart des processus cellulaires et biologiques reposent, à un certain niveau, sur des interactions protéine-protéine (IPP). Leur manipulation avec des composés chimiques démontre un grand potentiel pour la découverte de nouveaux médicaments. Malgré la demande toujours croissante en molécules capables d'interrompre sélectivement des IPP, le développement d'inhibiteurs d’IPP est fortement limité par la grande taille de la surface d'interaction. En considérant la nature de cette surface, la capacité à mimer des structures secondaires de protéines est très importante pour lier une protéine et inhiber une IPP. Avec leurs grandes capacités peptidomimétiques et leurs propriétés pharmacologiques intéressan-tes, les peptides cycliques sont des prototypes moléculaires de choix pour découvrir des ligands de protéines et développer de nouveaux inhibiteurs d’IPP. Afin d’exploiter pleinement la grande diversité accessible avec les peptides cycliques, l’approche combinatoire «one-bead-one-compound» (OBOC) est l’approche la plus accessible et puissante. Cependant, l'utilisation des peptides cycliques dans les chimiothèques OBOC est limitée par les difficultés à séquencer les composés actifs après le criblage. Sans amine libre en N-terminal, la dégradation d'Edman et la spectrométrie de masse en tandem (MS/MS) ne peuvent pas être utilisées. À cet égard, nous avons développé de nouvelles approches par ouverture de cycle pour préparer et décoder des chimiothèques OBOC de peptides cycliques. Notre stratégie était d'introduire un résidu sensible dans le macrocycle et comme ancrage pour permettre la linéarisation des peptides et leur largage des billes pour le séquençage par MS/MS. Tout d'abord, des résidus sensibles aux nucléophiles, aux ultraviolets ou au bromure de cyanogène ont été introduits dans un peptide cyclique et leurs rendements de clivage évalués. Ensuite, les résidus les plus prometteurs ont été utilisés dans la conception et le développement d’approches en tandem ouverture de cycle / clivage pour le décodage de chimiothèques OBOC de peptides cycliques. Dans la première approche, une méthionine a été introduite dans le macrocycle comme ancrage pour simultanément permettre l’ouverture du cycle et le clivage des billes par traitement au bromure de cyanogène. Dans la seconde approche, un résidu photosensible a été utilisé dans le macrocycle comme ancrage pour permettre l’ouverture du cycle et le clivage suite à une irradiation aux ultraviolets. Le peptide linéaire généré par ces approches peut alors être efficacement séquencé par MS/MS. Enfin, une chimiothèque OBOC a été préparée et criblée la protéine HIV-1 Nef pour identifier des ligands sélectifs. Le développement de ces méthodologies permttra l'utilisation de composés macrocycliques dans les chimiothèques OBOC et constitue une contribution importante en chimie médicinale pour la découverte de ligands de protéines et le développement d'inhibiteurs d’IPP.A great number of cellular and biological processes depend, at some level, on protein-protein interactions (PPI). Their manipulation with chemical compounds has provided a great potential for the discovery of new drugs. Despite the increasing demand for molecules able to interrupt specific PPI, the development of small PPI inhibitors is beset by a number of challenges such as the large size of the interaction interface. Based on the interface’s nature, the ability to mimic protein secondary structures is very important to bind a protein and inhibit PPI. With their interesting peptidomimetic abilities and pharmacological properties, cyclic peptides are very promising templates to discover protein ligands and development new PPI inhibitors. To fully exploit the great diversity accessible with cyclic peptides, the one-bead-one-compound (OBOC) combinatorial method is certainly the most accessible and powerful approach. Unfortunately, the use of cyclic peptides in OBOC libraries is limited by difficulties in sequencing hit compounds after the screening. Lacking a free N-terminal amine, Edman degradation cannot be used on cyclic peptides and complicated fragmentation patterns are obtained by tandem mass spectrometry (MS/MS). In this regard we have designed and developed new convenient ring-opening approaches to prepare and decode OBOC cyclic peptide libraries. Our strategy was to introduce a cleavable residue in the macrocycle and as a linker to allow linearization of peptides and their release from the beads for sequencing by MS/MS. First, amino acid residues sensible to nucleophiles, ultraviolet irradiation or cyanogens bromide were introduced in a model cyclic peptide. Afterward, the most promising residues were used to design and develop tandem ring-opening/cleavage approaches to decode OBOC cyclic peptide libraries. In the first approach a methionine residue was introduced in the macrocycle and as a linker to allow a simultaneous ring-opening and cleavage from the beads upon treatment with cyanogens bromide. In the second approach, a photosensitive residue was used in the macrocycle and as a linker for a dual ring-opening/cleavage upon UV irradiation. The linear peptide generated by these approaches can be efficiently sequenced by tandem mass spectrometry. Finally, an OBOC library has been prepared and screened against the HIV-1 Nef protein to identify selective ligands. The development of these methodologies will prompt the use of macrocyclic compounds in OBOC libraries and be an important contribution in medicinal chemistry for the discovery of protein ligands and the development of PPI inhibitors

    Pre-processing of tandem mass spectra using machine learning methods

    Get PDF
    Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole amino acid sequence can be inferred. Therefore, peptide identification is the first step and a central issue for protein identification. Tandem mass spectrometers can produce a large number of tandem mass spectra which are used for peptide identification. Two issues should be addressed to improve the performance of current peptide identification algorithms. Firstly, nearly all spectra are noise-contaminated. As a result, the accuracy of peptide identification algorithms may suffer from the noise in spectra. Secondly, the majority of spectra are not identifiable because they are of too poor quality. Therefore, much time is wasted attempting to identify these unidentifiable spectra. The goal of this research is to design spectrum pre-processing algorithms to both speedup and improve the reliability of peptide identification from tandem mass spectra. Firstly, as a tandem mass spectrum is a one dimensional signal consisting of dozens to hundreds of peaks, and majority of peaks are noisy peaks, a spectrum denoising algorithm is proposed to remove most noisy peaks of spectra. Experimental results show that our denoising algorithm can remove about 69% of peaks which are potential noisy peaks among a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31% and 14% for two tandem mass spectrum datasets. Next, a two-stage recursive feature elimination based on support vector machines (SVM-RFE) and a sparse logistic regression method are proposed to select the most relevant features to describe the quality of tandem mass spectra. Our methods can effectively select the most relevant features in terms of performance of classifiers trained with the different number of features. Thirdly, both supervised and unsupervised machine learning methods are used for the quality assessment of tandem mass spectra. A supervised classifier, (a support vector machine) can be trained to remove more than 90% of poor quality spectra without removing more than 10% of high quality spectra. Clustering methods such as model-based clustering are also used for quality assessment to cancel the need for a labeled training dataset and show promising results

    De novo sequencing of heparan sulfate saccharides using high-resolution tandem mass spectrometry

    Get PDF
    Heparan sulfate (HS) is a class of linear, sulfated polysaccharides located on cell surface, secretory granules, and in extracellular matrices found in all animal organ systems. It consists of alternately repeating disaccharide units, expressed in animal species ranging from hydra to higher vertebrates including humans. HS binds and mediates the biological activities of over 300 proteins, including growth factors, enzymes, chemokines, cytokines, adhesion and structural proteins, lipoproteins and amyloid proteins. The binding events largely depend on the fine structure - the arrangement of sulfate groups and other variations - on HS chains. With the activated electron dissociation (ExD) high-resolution tandem mass spectrometry technique, researchers acquire rich structural information about the HS molecule. Using this technique, covalent bonds of the HS oligosaccharide ions are dissociated in the mass spectrometer. However, this information is complex, owing to the large number of product ions, and contains a degree of ambiguity due to the overlapping of product ion masses and lability of sulfate groups; as a result, there is a serious barrier to manual interpretation of the spectra. The interpretation of such data creates a serious bottleneck to the understanding of the biological roles of HS. In order to solve this problem, I designed HS-SEQ - the first HS sequencing algorithm using high-resolution tandem mass spectrometry. HS-SEQ allows rapid and confident sequencing of HS chains from millions of candidate structures and I validated its performance using multiple known pure standards. In many cases, HS oligosaccharides exist as mixtures of sulfation positional isomers. I therefore designed MULTI-HS-SEQ, an extended version of HS-SEQ targeting spectra coming from more than one HS sequence. I also developed several pre-processing and post-processing modules to support the automatic identification of HS structure. These methods and tools demonstrated the capacity for large-scale HS sequencing, which should contribute to clarifying the rich information encoded by HS chains as well as developing tailored HS drugs to target a wide spectrum of diseases

    Overcoming challenges of shotgun proteomics

    Get PDF
    • …
    corecore