Binomial Probability Distribution
Model-Based Protein
Identification Algorithm for Tandem Mass Spectrometry Utilizing Peak
Intensity Information
- Publication date
- Publisher
Abstract
Mass spectrometry has become one of the most important
technologies
in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major
tool for the analysis of peptide mixtures from protein samples. The
key step of MS data processing is the identification of peptides from
experimental spectra by searching public sequence databases. Although
a number of algorithms to identify peptides from MS/MS data have been
already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they
are mainly based on statistical models considering only peak-matches
between experimental and theoretical spectra, but not peak intensity
information. Moreover, different algorithms gave different results
from the same MS data, implying their probable incompleteness and
questionable reproducibility. We developed a novel peptide identification
algorithm, ProVerB, based on a binomial probability distribution model
of protein tandem mass spectrometry combined with a new scoring function,
making full use of peak intensity information and, thus, enhancing
the ability of identification. Compared with Mascot, Sequest, and
SQID, ProVerB identified significantly more peptides from LC-MS/MS
data sets than the current algorithms at 1% False Discovery Rate (FDR)
and provided more confident peptide identifications. ProVerB is also
compatible with various platforms and experimental data sets, showing
its robustness and versatility. The open-source program ProVerB is
available at http://bioinformatics.jnu.edu.cn/software/proverb/