1 research outputs found
Empirical Multidimensional Space for Scoring Peptide Spectrum Matches in Shotgun Proteomics
Data-dependent tandem
mass spectrometry (MS/MS) is one of the main
techniques for protein identification in shotgun proteomics. In a
typical LC–MS/MS workflow, peptide product ion mass spectra
(MS/MS spectra) are compared with those derived theoretically from
a protein sequence database. Scoring of these matches results in peptide
identifications. A set of peptide identifications is characterized
by false discovery rate (FDR), which determines the fraction of false
identifications in the set. The total number of peptides targeted
for fragmentation is in the range of 10 000 to 20 000
for a several-hour LC–MS/MS run. Typically, <50% of these
MS/MS spectra result in peptide-spectrum matches (PSMs). A small fraction
of PSMs pass the preset FDR level (commonly 1%) giving a list of identified
proteins, yet a large number of correct PSMs corresponding to the
peptides originally present in the sample are left behind in the “grey
area” below the identity threshold. Following the numerous
efforts to recover these correct PSMs, here we investigate the utility
of a scoring scheme based on the multiple PSM descriptors available
from the experimental data. These descriptors include retention time,
deviation between experimental and theoretical mass, number of missed
cleavages upon in-solution protein digestion, precursor ion fraction
(PIF), PSM count per sequence, potential modifications, median fragment
mass error, <sup>13</sup>C isotope mass difference, charge states,
and number of PSMs per protein. The proposed scheme utilizes a set
of metrics obtained for the corresponding distributions of each of
the descriptors. We found that the proposed PSM scoring algorithm
differentiates equally or more efficiently between correct and incorrect
identifications compared with existing postsearch validation approaches