1 research outputs found
Improving the Results of De novo Peptide Identification via Tandem Mass Spectrometry Using a Genetic Programming-based Scoring Function for Re-ranking Peptide-Spectrum Matches
De novo peptide sequencing algorithms have been widely used in proteomics to
analyse tandem mass spectra (MS/MS) and assign them to peptides, but
quality-control methods to evaluate the confidence of de novo peptide
sequencing are lagging behind. A fundamental part of a quality-control method
is the scoring function used to evaluate the quality of peptide-spectrum
matches (PSMs). Here, we propose a genetic programming (GP) based method,
called GP-PSM, to learn a PSM scoring function for improving the rate of
confident peptide identification from MS/MS data. The GP method learns from
thousands of MS/MS spectra. Important characteristics about goodness of the
matches are extracted from the learning set and incorporated into the GP
scoring functions. We compare GP-PSM with two methods including Support Vector
Regression (SVR) and Random Forest (RF). The GP method along with RF and SVR,
each is used for post-processing the results of peptide identification by
PEAKS, a commonly used de novo sequencing method. The results show that GP-PSM
outperforms RF and SVR and discriminates accurately between correct and
incorrect PSMs. It correctly assigns peptides to 10% more spectra on an
evaluation dataset containing 120 MS/MS spectra and decreases the false
positive rate (FPR) of peptide identification.Comment: 13 pages, conference paper, containing 2 figure