153 research outputs found
Overview of shotgun proteomics data production.
<p>(A) Schematic of a typical shotgun proteomics experiment. The three steps—(1) cleaving proteins into peptides, (2) separation of peptides using liquid chromatography, and (3) tandem mass spectrometry analysis—are described in the text. (B) A sample fragmentation spectrum, along with the peptide responsible for generating the spectrum.</p
Effects of Modified Digestion Schemes on the Identification of Proteins from Complex Mixtures
In shotgun proteomics, a complex protein mixture
is digested to peptides, separated, and identified by
microcapillary liquid chromatography coupled with tandem mass spectrometry (LC−MS/MS). In this technology,
complete protein digestion is often assumed. We show
that, to the contrary, modifications to a standard digestion
protocol demonstrate large, reproducible improvements
in protein identification, a result consistent with digestion
being a limiting factor in the efficiency of protein identification.
Keywords: mass spectrometry • proteomics • digestion • protein
identificatio
Efficient Marginalization to Compute Protein Posterior Probabilities from Shotgun Mass Spectrometry Data
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors
Comparison of Database Search Strategies for High Precursor Mass Accuracy MS/MS Data
In shotgun proteomics, the analysis of tandem mass spectrometry data from peptides can benefit greatly from high mass accuracy measurements. In this study, we have evaluated two database search strategies which use high mass accuracy measurements of the peptide precursor ion. Our results indicate that peptide identifications are improved when spectra are searched with a wide mass tolerance window and precursor mass is used as a filter to discard incorrect matches. Database searches with a peptide data set constrained to peptides within a narrow mass window resulted in fewer peptide identifications but a significantly faster database search time
Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments
The identification of proteins from spectra derived from
a tandem
mass spectrometry experiment involves several challenges: matching
each observed spectrum to a peptide sequence, ranking the resulting
collection of peptide-spectrum matches, assigning statistical confidence
estimates to the matches, and identifying the proteins. The present
work addresses algorithms to rank peptide–spectrum matches.
Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker,
follow a similar methodology that includes representing peptide-spectrum
matches as feature vectors and using optimization techniques to rank
them. We propose a richer and more flexible feature set representation
that is based on the parametrization of the SEQUEST XCorr score and
that can be used by all of these algorithms. This extended feature
set allows a more effective ranking of the peptide-spectrum matches
based on the target-decoy strategy, in comparison to a baseline feature
set devoid of these XCorr-based features. Ranking using the extended
feature set gives 10–40% improvement in the number of distinct
peptide identifications relative to a range of <i>q</i>-value
thresholds. While this work is inspired by the model of the theoretical
spectrum and the similarity measure between spectra used specifically
by SEQUEST, the method itself can be applied to the output of any
database search. Further, our approach can be trivially extended beyond
XCorr to any linear operator that can serve as similarity score between
experimental spectra and peptide sequences
Improving Tandem Mass Spectrum Identification Using Peptide Retention Time Prediction across Diverse Chromatography Conditions
Most algorithms for identifying peptides from tandem
mass spectra use information only from the final spectrum, ignoring non-mass-based information acquired
routinely in liquid chromatography tandem mass spectrometry analyses. One physiochemical property that is
always obtained but rarely exploited is peptide chromatographic retention time. Efforts to use chromatographic
retention time to improve peptide identification are complicated because of the variability of retention time in
different experimental conditionsmaking retention time
calculations nongeneralizable. We show that peptide
retention time can be reliably predicted by training and
testing a support vector regressor on a small collection
of data from a single liquid chromatography run. This
model can be used to filter peptide identifications with
observed retention time that deviates from predicted
retention time. After filtering, positive peptide identifications increase by as much as 50% at a false discovery rate
of 3%. We demonstrate that our dynamically trained
model generalizes well across diverse chromatography
conditions and methods for generating peptides, in particular improving peptide identification using nonspecific
proteases
High Quality Catalog of Proteotypic Peptides from Human Heart
Proteomics research is beginning to expand beyond the more traditional shotgun analysis of protein mixtures to include targeted analyses of specific proteins using mass spectrometry. Integral to the development of a robust assay based on targeted mass spectrometry is prior knowledge of which peptides provide an accurate and sensitive proxy of the originating gene product (i.e., proteotypic peptides). To develop a catalog of “proteotypic peptides” in human heart, TRIzol extracts of left-ventricular tissue from nonfailing and failing human heart explants were optimized for shotgun proteomic analysis using Multidimensional Protein Identification Technology (MudPIT). Ten replicate MudPIT analyses were performed on each tissue sample and resulted in the identification of 30 605 unique peptides with a q-value ≤ 0.01, corresponding to 7138 unique human heart proteins. Experimental observation frequencies were assessed and used to select over 4476 proteotypic peptides for 2558 heart proteins. This human cardiac data set can serve as a public reference to guide the selection of proteotypic peptides for future targeted mass spectrometry experiments monitoring potential protein biomarkers of human heart diseases
High Quality Catalog of Proteotypic Peptides from Human Heart
Proteomics research is beginning to expand beyond the more traditional shotgun analysis of protein mixtures to include targeted analyses of specific proteins using mass spectrometry. Integral to the development of a robust assay based on targeted mass spectrometry is prior knowledge of which peptides provide an accurate and sensitive proxy of the originating gene product (i.e., proteotypic peptides). To develop a catalog of “proteotypic peptides” in human heart, TRIzol extracts of left-ventricular tissue from nonfailing and failing human heart explants were optimized for shotgun proteomic analysis using Multidimensional Protein Identification Technology (MudPIT). Ten replicate MudPIT analyses were performed on each tissue sample and resulted in the identification of 30 605 unique peptides with a q-value ≤ 0.01, corresponding to 7138 unique human heart proteins. Experimental observation frequencies were assessed and used to select over 4476 proteotypic peptides for 2558 heart proteins. This human cardiac data set can serve as a public reference to guide the selection of proteotypic peptides for future targeted mass spectrometry experiments monitoring potential protein biomarkers of human heart diseases
Selection on Plant Male Function Genes Identifies Candidates for Reproductive Isolation of Yellow Monkeyflowers
<div><p>Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (<i>Mimulus</i> spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of <i>M. guttatus</i> flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of <i>M. guttatus</i>, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the goal of identifying the molecular basis of genetically complex traits.</p></div
Isotope Signatures Allow Identification of Chemically Cross-Linked Peptides by Mass Spectrometry: A Novel Method to Determine Interresidue Distances in Protein Structures through Cross-Linking
Knowledge of protein structures and protein−protein interactions is essential for understanding of biological processes. Recent advances in protein cross-linking and mass spectrometry (MS) have shown significant potential to contribute to this area. Here we report a novel method to rapidly and accurately identify cross-linked peptides based on their unique isotope signature when digested in the presence of H218O. This method overcomes the need for specially synthesized cross-linkers and/or multiple MS runs required by other techniques. We validated our method by performing a “blind” analysis of 5 proteins/complexes of known structure. Side chain repacking calculations using Rosetta show that 17 of our 20 positively identified cross-links fit the published atomic structures. The remaining 3 cross-links are likely due to protein aggregation. The accuracy and rapid throughput of our workflow will advance the use of protein cross-linking in structural biology
- …
