In shotgun proteomics,
peptides are typically identified using
database searching, which involves scoring acquired tandem mass spectra
against peptides derived from standard protein sequence databases
such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity
of peptide identification is known to be affected by the size of the
search space. Therefore, creating a targeted sequence database containing
only peptides likely to be present in the analyzed sample can be a
useful technique for improving the sensitivity of peptide identification.
In this study, we describe how targeted peptide databases can be created
based on the frequency of identification in the global proteome machine
database (GPMDB), the largest publicly available repository of peptide
and protein identification data. We demonstrate that targeted peptide
databases can be easily integrated into existing proteome analysis
workflows and describe a computational strategy for minimizing any
loss of peptide identifications arising from potential search space
incompleteness in the targeted search spaces. We demonstrate the performance
of our workflow using several data sets of varying size and sample
complexity