PIPI: PTM-Invariant
Peptide Identification Using Coding
Method
- Publication date
- Publisher
Abstract
In computational
proteomics, the identification of peptides with
an unlimited number of post-translational modification (PTM) types
is a challenging task. The computational cost associated with database
search increases exponentially with respect to the number of modified
amino acids and linearly with respect to the number of potential PTM
types at each amino acid. The problem becomes intractable very quickly
if we want to enumerate all possible PTM patterns. To address this
issue, one group of methods named restricted tools (including Mascot,
Comet, and MS-GF+) only allow a small number of PTM types in database
search process. Alternatively, the other group of methods named unrestricted
tools (including MS-Alignment, ProteinProspector, and MODa) avoids
enumerating PTM patterns with an alignment-based approach to localizing
and characterizing modified amino acids. However, because of the large
search space and PTM localization issue, the sensitivity of these
unrestricted tools is low. This paper proposes a novel method named
PIPI to achieve PTM-invariant peptide identification. PIPI belongs
to the category of unrestricted tools. It first codes peptide sequences
into Boolean vectors and codes experimental spectra into real-valued
vectors. For each coded spectrum, it then searches the coded sequence
database to find the top scored peptide sequences as candidates. After
that, PIPI uses dynamic programming to localize and characterize modified
amino acids in each candidate. We used simulation experiments and
real data experiments to evaluate the performance in comparison with
restricted tools (i.e., Mascot, Comet, and MS-GF+) and unrestricted
tools (i.e., Mascot with error tolerant search, MS-Alignment, ProteinProspector,
and MODa). Comparison with restricted tools shows that PIPI has a
close sensitivity and running speed. Comparison with unrestricted
tools shows that PIPI has the highest sensitivity except for Mascot
with error tolerant search and ProteinProspector. These two tools
simplify the task by only considering up to one modified amino acid
in each peptide, which results in a higher sensitivity but has difficulty
in dealing with multiple modified amino acids. The simulation experiments
also show that PIPI has the lowest false discovery proportion, the
highest PTM characterization accuracy, and the shortest running time
among the unrestricted tools