37 research outputs found
Spectral Library Generating Function for Assessing Spectrum-Spectrum Match Significance
Tandem mass spectrometry (MS/MS) continues to be the technology
of choice for high-throughput analysis of complex proteomics samples.
While MS/MS spectra are commonly identified by matching against a
database of known protein sequences, the complementary approach of
spectral library searching against collections of reference spectra
consistently outperforms sequence-based searches by resulting in significantly
more identified spectra. However, while spectral library searches
benefit from the advance knowledge of the expected peptide fragmentation
patterns recorded in library spectra, estimation of the statistical
significance of spectrum-spectrum matches (SSMs) continues to be hindered
by difficulties in finding an appropriate definition of ārandomā
SSMs to use as a null model when estimating the significance of true
SSMs. We propose to avoid this problem by changing the null hypothesis:
instead of determining the probability of observing a high SSM score
between randomly matched spectra, we estimate the probability of observing
a low SSM score between <i>replicate</i> spectra of the
same molecule. To this end, we explicitly model the variation in instrument
measurements of MS/MS peak intensities and show how these models can
be used to determine a theoretical distribution of SSM scores between
reference and query spectra of the same molecule. While the proposed
spectral library generating function (SLGF) approach can be used to
calculate theoretical distributions for any additive SSM score (e.g.,
any dot product), we further show how it can be used to calculate
the distribution of expected cosines between reference and query spectra.
We developed a spectral library search tool, Tremolo, and demonstrate
that this SLGF-based search tool significantly outperforms current
state-of-the-art spectral library search tools and provide a detailed
discussion of the multiple reasons behind the observed differences
in the sets of identified MS/MS spectra
Shotgun Protein Sequencing by Tandem Mass Spectra Assembly
The analysis of mass spectrometry data is still largely
based on identification of single MS/MS spectra and does
not attempt to make use of the extra information available
in multiple MS/MS spectra from partially or completely
overlapping peptides. Analysis of MS/MS spectra from
multiple overlapping peptides opens up the possibility of
assembling MS/MS spectra into entire proteins, similarly
to the assembly of overlapping DNA reads into entire
genomes. In this paper, we present for the first time a
way to detect, score, and interpret overlaps between
uninterpreted MS/MS spectra in an attempt to sequence
entire proteins rather than individual peptides. We show
that this approach not only extends the length of reconstructed amino acid sequences but also dramatically
improves the quality of de novo peptide sequencing, even
for low mass accuracy MS/MS data
ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets
High-throughput
tandem mass spectrometry has enabled the detection
and identification of over 75% of all proteins predicted to result
in translated gene products in the human genome. In fact, the galloping
rate of data acquisition and sharing of mass spectrometry data has
led to the current availability of many tens of terabytes of public
data in thousands of human data sets. The systematic reanalysis of
these public data sets has been used to build a community-scale spectral
library of 2.1 million precursors for over 1 million unique sequences
from over 19,000 proteins (including spectra of synthetic peptides).
However, it has remained challenging to find and inspect spectra of
peptides covering functional protein regions or matching novel proteins.
ProteinExplorer addresses these challenges with an intuitive interface
mapping tens of millions of identifications to functional sites on
nearly all human proteins while maintaining provenance for every identification
back to the original data set and data file. Additionally, ProteinExplorer
facilitates the selection and inspection of HPP-compliant peptides
whose spectra can be matched to spectra of synthetic peptides and
already includes HPP-compliant evidence for 107 missing (PE2, PE3,
and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows
users to rate spectra and to contribute to a community library of
peptides entitled PrEdict (Protein Existance dictionary) mapping to
novel proteins but whose preliminary identities have not yet been
fully established with community-scale false discovery rates and synthetic
peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp
ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets
High-throughput
tandem mass spectrometry has enabled the detection
and identification of over 75% of all proteins predicted to result
in translated gene products in the human genome. In fact, the galloping
rate of data acquisition and sharing of mass spectrometry data has
led to the current availability of many tens of terabytes of public
data in thousands of human data sets. The systematic reanalysis of
these public data sets has been used to build a community-scale spectral
library of 2.1 million precursors for over 1 million unique sequences
from over 19,000 proteins (including spectra of synthetic peptides).
However, it has remained challenging to find and inspect spectra of
peptides covering functional protein regions or matching novel proteins.
ProteinExplorer addresses these challenges with an intuitive interface
mapping tens of millions of identifications to functional sites on
nearly all human proteins while maintaining provenance for every identification
back to the original data set and data file. Additionally, ProteinExplorer
facilitates the selection and inspection of HPP-compliant peptides
whose spectra can be matched to spectra of synthetic peptides and
already includes HPP-compliant evidence for 107 missing (PE2, PE3,
and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows
users to rate spectra and to contribute to a community library of
peptides entitled PrEdict (Protein Existance dictionary) mapping to
novel proteins but whose preliminary identities have not yet been
fully established with community-scale false discovery rates and synthetic
peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp
ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets
High-throughput
tandem mass spectrometry has enabled the detection
and identification of over 75% of all proteins predicted to result
in translated gene products in the human genome. In fact, the galloping
rate of data acquisition and sharing of mass spectrometry data has
led to the current availability of many tens of terabytes of public
data in thousands of human data sets. The systematic reanalysis of
these public data sets has been used to build a community-scale spectral
library of 2.1 million precursors for over 1 million unique sequences
from over 19,000 proteins (including spectra of synthetic peptides).
However, it has remained challenging to find and inspect spectra of
peptides covering functional protein regions or matching novel proteins.
ProteinExplorer addresses these challenges with an intuitive interface
mapping tens of millions of identifications to functional sites on
nearly all human proteins while maintaining provenance for every identification
back to the original data set and data file. Additionally, ProteinExplorer
facilitates the selection and inspection of HPP-compliant peptides
whose spectra can be matched to spectra of synthetic peptides and
already includes HPP-compliant evidence for 107 missing (PE2, PE3,
and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows
users to rate spectra and to contribute to a community library of
peptides entitled PrEdict (Protein Existance dictionary) mapping to
novel proteins but whose preliminary identities have not yet been
fully established with community-scale false discovery rates and synthetic
peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp
ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets
High-throughput
tandem mass spectrometry has enabled the detection
and identification of over 75% of all proteins predicted to result
in translated gene products in the human genome. In fact, the galloping
rate of data acquisition and sharing of mass spectrometry data has
led to the current availability of many tens of terabytes of public
data in thousands of human data sets. The systematic reanalysis of
these public data sets has been used to build a community-scale spectral
library of 2.1 million precursors for over 1 million unique sequences
from over 19,000 proteins (including spectra of synthetic peptides).
However, it has remained challenging to find and inspect spectra of
peptides covering functional protein regions or matching novel proteins.
ProteinExplorer addresses these challenges with an intuitive interface
mapping tens of millions of identifications to functional sites on
nearly all human proteins while maintaining provenance for every identification
back to the original data set and data file. Additionally, ProteinExplorer
facilitates the selection and inspection of HPP-compliant peptides
whose spectra can be matched to spectra of synthetic peptides and
already includes HPP-compliant evidence for 107 missing (PE2, PE3,
and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows
users to rate spectra and to contribute to a community library of
peptides entitled PrEdict (Protein Existance dictionary) mapping to
novel proteins but whose preliminary identities have not yet been
fully established with community-scale false discovery rates and synthetic
peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp
Sequencing-Grade <i>De novo</i> Analysis of MS/MS Triplets (CID/HCD/ETD) From Overlapping Peptides
Full-length <i>de novo</i> sequencing of unknown proteins
remains a challenging open problem. Traditional methods that sequence
spectra individually are limited by short peptide length, incomplete
peptide fragmentation, and ambiguous <i>de novo</i> interpretations.
We address these issues by determining consensus sequences for assembled
tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using
multiple enzymatic digests). We have combined electron-transfer dissociation
(ETD) with collision-induced dissociation (CID) and higher-energy
collision-induced dissociation (HCD) fragmentation methods to boost
interpretation of long, highly charged peptides and take advantage
of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies,
we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides
yield <i>de novo</i> sequences of average length 70 AA and
as long as 200 AA at up to 99% sequencing accuracy
<i>SweetNET</i>: A Bioinformatics Workflow for Glycopeptide MS/MS Spectral Analysis
Glycoproteomics
has rapidly become an independent analytical platform
bridging the fields of glycomics and proteomics to address site-specific
protein glycosylation and its impact in biology. Current glycopeptide
characterization relies on time-consuming manual interpretations and
demands high levels of personal expertise. Efficient data interpretation
constitutes one of the major challenges to be overcome before true
high-throughput glycopeptide analysis can be achieved. The development
of new glyco-related bioinformatics tools is thus of crucial importance
to fulfill this goal. Here we present <i>SweetNET</i>: a
data-oriented bioinformatics workflow for efficient analysis of hundreds
of thousands of glycopeptide MS/MS-spectra. We have analyzed MS data
sets from two separate glycopeptide enrichment protocols targeting
sialylated glycopeptides and chondroitin sulfate linkage region glycopeptides,
respectively. Molecular networking was performed to organize the glycopeptide
MS/MS data based on spectral similarities. The combination of spectral
clustering, oxonium ion intensity profiles, and precursor ion <i>m</i>/<i>z</i> shift distributions provided typical
signatures for the initial assignment of different N-, O- and CS-glycopeptide
classes and their respective glycoforms. These signatures were further
used to guide database searches leading to the identification and
validation of a large number of glycopeptide variants including novel
deoxyhexose (fucose) modifications in the linkage region of chondroitin
sulfate proteoglycans
Clustering Millions of Tandem Mass Spectra
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec