The State of the Human
Proteome in 2012 as Viewed
through PeptideAtlas
- Publication date
- Publisher
Abstract
The Human Proteome Project was launched in September
2010 with
the goal of characterizing at least one protein product from each
protein-coding gene. Here we assess how much of the proteome has been
detected to date via tandem mass spectrometry by analyzing PeptideAtlas,
a compendium of human derived LC–MS/MS proteomics data from
many laboratories around the world. All data sets are processed with
a consistent set of parameters using the Trans-Proteomic Pipeline
and subjected to a 1% protein FDR filter before inclusion in PeptideAtlas.
Therefore, PeptideAtlas contains only high confidence protein identifications.
To increase proteome coverage, we explored new comprehensive public
data sources for data likely to add new proteins to the Human PeptideAtlas.
We then folded these data into a Human PeptideAtlas 2012 build and
mapped it to Swiss-Prot, a protein sequence database curated to contain
one entry per human protein coding gene. We find that this latest
PeptideAtlas build includes at least one peptide for each of ∼12500
Swiss-Prot entries, leaving ∼7500 gene products yet to be confidently
cataloged. We characterize these “PA-unseen” proteins
in terms of tissue localization, transcript abundance, and Gene Ontology
enrichment, and propose reasons for their absence from PeptideAtlas
and strategies for detecting them in the future