Search CORE

35,318 research outputs found

Integrative Analysis Frameworks for Improved Peptide and Protein Identifications from Tandem Mass Spectrometry Data.

Author: Shanmugam Avinash Kumar
Publication venue
Publication date: 01/01/2015
Field of study

Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for high throughput protein identification in modern proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. The major focus of these statistical methods is to assign probability scores to the identifications to distinguish between high confidence, reliable identifications that may be accepted (typically corresponding to a false discovery rate, FDR, of 1% or 5%) and lower confidence, spurious identifications that are rejected. These identification probabilities are determined, in general, considering only evidence from the MS/MS data. However, considering the wealth of external (orthogonal) data available for most biological systems, integrating such orthogonal information into proteomics analysis pipelines can be a promising approach to improve the sensitivity of these analysis pipelines and rescue true positive identifications that were rejected for want of sufficient evidence supporting their presence. In this dissertation, approaches based on naive bayes rescoring, search space restriction, and a hybrid approach that combines both are described for integrating orthogonal information in proteomic analysis pipelines. These methods have been applied for integrating transcript abundance data from RNA-seq and identification frequency data from the Global Proteome Machine database, GPMDB (one of the largest repositories of proteomic experiment results), into analysis pipelines, improving the number of peptide and protein identifications from MS/MS data. Further, estimation of false discovery rates in very large proteomic datasets was also investigated. In very large datasets, usually resulting from integrating data from multiple experiments, some assumptions used in typical target-decoy based FDR estimation in smaller datasets no longer hold true, resulting in artificially inflated error rates. Alternative approaches that would allow accurate FDR estimation in these large scale datasets have been described and benchmarked.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116717/1/avinashs_1.pd

Deep Blue Documents at the University of Michigan

Bayesian nonparametric models for peak identification in MALDI-TOF mass spectroscopy

Author: Clyde Merlise A.
House Leanna L.
Wolpert Robert L.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/06/2011
Field of study

We present a novel nonparametric Bayesian approach based on L\'{e}vy Adaptive Regression Kernels (LARK) to model spectral data arising from MALDI-TOF (Matrix Assisted Laser Desorption Ionization Time-of-Flight) mass spectrometry. This model-based approach provides identification and quantification of proteins through model parameters that are directly interpretable as the number of proteins, mass and abundance of proteins and peak resolution, while having the ability to adapt to unknown smoothness as in wavelet based methods. Informative prior distributions on resolution are key to distinguishing true peaks from background noise and resolving broad peaks into individual peaks for multiple protein species. Posterior distributions are obtained using a reversible jump Markov chain Monte Carlo algorithm and provide inference about the number of peaks (proteins), their masses and abundance. We show through simulation studies that the procedure has desirable true-positive and false-discovery rates. Finally, we illustrate the method on five example spectra: a blank spectrum, a spectrum with only the matrix of a low-molecular-weight substance used to embed target proteins, a spectrum with known proteins, and a single spectrum and average of ten spectra from an individual lung cancer patient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS450 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

DukeSpace

Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

Author: Blaženović Ivana
Fiehn Oliver
Ji Jian
Kind Tobias
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

eScholarship - University of California

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics.

Author: A El-Tayeb
AD Hanson
AL Hartman
Atsushi Ogiwara
C Fattuoni
C Ruttkies
CL Linster
DY Lee
F Allen
Gert Wohlgemuth
GJ Patti
H Sperber
H Tsugawa
H Tsugawa
H Tsugawa
HD Flosadóttir
Hiroshi Tsugawa
I Yamamoto
J Budczies
JG Jeffryes
John Meissen
K Haug
Kohei Takeuchi
M Sud
Masanori Arita
Matthew Mueller
Megan Showalter
MP Styczynski
MR Showalter
O Fiehn
O Fiehn
O Fiehn
O Khersonsky
Oliver Fiehn
Peter Beal
RR da Silva
S Kim
S Kumari
Sajjan Mehta
SE Stein
SM Rappaport
T Kind
Tobias Kind
WR Wikoff
Yuxuan Zheng
Zijuan Lai
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Novel metabolites distinct from canonical pathways can be identified through the integration of three cheminformatics tools: BinVestigate, which queries the BinBase gas chromatography-mass spectrometry (GC-MS) metabolome database to match unknowns with biological metadata across over 110,000 samples; MS-DIAL 2.0, a software tool for chromatographic deconvolution of high-resolution GC-MS or liquid chromatography-mass spectrometry (LC-MS); and MS-FINDER 2.0, a structure-elucidation program that uses a combination of 14 metabolome databases in addition to an enzyme promiscuity library. We showcase our workflow by annotating N-methyl-uridine monophosphate (UMP), lysomonogalactosyl-monopalmitin, N-methylalanine, and two propofol derivatives

Crossref

eScholarship - University of California

Extensive mass spectrometry-based analysis of the fission yeast proteome: the Schizosaccharomyces pombe PeptideAtlas

Author: Aebersold
Alexa
Anderson
Baerenfaller
Beck
Benjamini
Beyer
Bitton
Brockmann
Brunner
Chen
Craig
Davis
de Godoy
de Graaf
Deutsch
Deutsch
Elias
Geer
Ghaemmaghami
Greenbaum
Gygi
Humphrey
Huttlin
Ishihama
Ishihama
Keller
Kuster
Lackner
Lackner
Lam
Lam
Lam
Lam
Marguerat
Mata
Nagaraj
Nesvizhskii
Perkins
Reiter
Remacle
Sakharkar
Savas
Schmidt
Schmidt
Schrimpf
Sherwood
Shevchenko
Shteynberg
Sipiczki
Vizcaíno
Warringer
Wilhelm
Wood
Wood
Wood
Wu
Zhang
Publication venue
Publication date: 01/01/2013
Field of study

We report a high quality and system-wide proteome catalogue covering 71% (3,542 proteins) of the predicted genes of fission yeast, Schizosaccharomyces pombe, presenting the largest protein dataset to date for this important model organism. We obtained this high proteome and peptide (11.4 peptides/protein) coverage by a combination of extensive sample fractionation, high resolution Orbitrap mass spectrometry, and combined database searching using the iProphet software as part of the Trans-Proteomics Pipeline. All raw and processed data are made accessible in the S. pombe PeptideAtlas. The identified proteins showed no biases in functional properties and allowed global estimation of protein abundances. The high coverage of the PeptideAtlas allowed correlation with transcriptomic data in a system-wide manner indicating that post-transcriptional processes control the levels of at least half of all identified proteins. Interestingly, the correlation was not equally tight for all functional categories ranging from r(s) >0.80 for proteins involved in translation to r(s) <0.45 for signal transduction proteins. Moreover, many proteins involved in DNA damage repair could not be detected in the PeptideAtlas despite their high mRNA levels, strengthening the translation-on-demand hypothesis for members of this protein class. In summary, the extensive and publicly available S. pombe PeptideAtlas together with the generated proteotypic peptide spectral library will be a useful resource for future targeted, in-depth, and quantitative proteomic studies on this microorganism