1 research outputs found
PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search
Standard
proteomics workflows use tandem mass spectrometry followed
by sequence database search to analyze complex biological samples.
The identification of proteins carrying post-translational modifications,
for example, phosphorylation, is typically addressed by allowing variable
modifications in the searched sequences. Accounting for these variations
exponentially increases the combinatorial space in the database, which
leads to increased processing times and more false positive identifications.
The here-presented tool PhoStar identifies spectra that originate
from phosphorylated peptides before database search using a supervised
machine learning approach. The model for the prediction of phosphorylation
was trained and validated with an accuracy of 97.6% on a large set
of high-confidence spectra collected from publicly available experimental
data. Its power was further validated by predicting phosphorylation
in the complete NIST human and mouse high collision-dissociation spectral
libraries, achieving an accuracy of 98.2 and 97.9%, respectively.
We demonstrate the application of PhoStar by using it for spectra
filtering before database search. In database search of HeLa samples
the peptide search space was reduced by 27β66% while finding
at least 97% of total peptide identifications (at 1% FDR) compared
with a standard workflow