Search CORE

1,588 research outputs found

Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification

Author: A. A. Klammer
Bafna
Danc k
Elias
Field
Frank
Geer
Havilio
Hoopmann
J. A. Bilmes
Kall
Kall
Klammer
M. J. MacCoss
Mann
Pavlidis
S. M. Reynolds
Tabb
Tabb
Tanner
W. S. Noble
Washburn
Yates
Zhang
Zhang
Zubarev
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms

CiteSeerX

Crossref

PubMed Central

Leveraging crosslinking mass spectrometry in structural and cell biology

Author: Graziadei Andrea
Rappsilber Juri
Publication venue: 'Elsevier BV'
Publication date: 06/01/2022
Field of study

Edinburgh Research Explorer

Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem

Author: Andreotti Sandro
Klau Gunnar W.
Reinert Knut
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper we present Antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. Antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen's k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a dataset of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that Antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of run time and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. Antilope will be freely available as part of the open source proteomics library OpenMS

arXiv.org e-Print Archive

CiteSeerX

VU Research Portal

Crossref

CWI's Institutional Repository

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Improved Algorithms for Discovery of New Genes in Bacterial Genomes

Author: Wang Nan
Publication venue: Scholars Junction
Publication date: 08/08/2009
Field of study

In this dissertation, we describe a new approach for gene finding that can utilize proteomics information in addition to DNA and RNA to identify new genes in prokaryote genomes. Proteomics processing pipelines require identification of small pieces of proteins called peptides. Peptide identification is a very error-prone process and we have developed a new algorithm for validating peptide identifications using a distance-based outlier detection method. We demonstrate that our method identifies more peptides than other popular methods using standard mixtures of known proteins. In addition, our algorithm provides a much more accurate estimate of the false discovery rate than other methods. Once peptides have been identified and validated, we use a second algorithm, proteogenomic mapping (PGM) to map these peptides to the genome to find the genetic signals that allow us to identify potential novel protein coding genes called expressed Protein Sequence Tags (ePSTs). We then collect and combine evidence for ePSTs we generated, and evaluate the likelihood that each ePST represents a true new protein coding gene using supervised machine learning techniques. We use machine learning approaches to evaluate the likelihood that the ePSTs represent new genes. Finally, we have developed new approaches to Bayesian learning that allow us to model the knowledge domain from sparse biological datasets. We have developed two new bootstrap approaches that utilize resampling to build networks with the most robust features that reoccur in many networks. These bootstrap methods yield improved prediction accuracy. We have also developed an unsupervised Bayesian network structure learning method that can be used when training data is not available or when labels may not be reliable

Scholars Junction - Mississippi State University Institutional Repository

Improved Algorithms for Discovery of New Genes in Bacterial Genomes

Author: Wang Nan
Publication venue: Scholars Junction
Publication date: 03/08/2009
Field of study

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

DART-ID increases single-cell proteome coverage.

Author: Chen Albert Tian
Franks Alexander
Slavov Nikolai
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net

Directory of Open Access Journals

eScholarship - University of California

Statistical methods for differential proteomics at peptide and protein level

Author: Goeminne Ludger
Publication venue: Ghent University. Faculty of Science ; Faculty of Medicine and Health Sciences
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography