4 research outputs found
Correction to “Analyzing the First Drafts of the Human Proteome”
Correction to
“Analyzing the First Drafts of the Human Proteome
Analyzing the First Drafts of the Human Proteome
This letter analyzes two
large-scale proteomics studies published
in the same issue of <i>Nature</i>. At the time of the release,
both studies were portrayed as draft maps of the human proteome and
great advances in the field. As with the initial publication of the
human genome, these papers have broad appeal and will no doubt lead
to a great deal of further analysis by the scientific community. However,
we were intrigued by the number of protein-coding genes detected by
the two studies, numbers that far exceeded what has been reported
for the multinational Human Proteome Project effort. We carried out
a simple quality test on the data using the olfactory receptor family.
A high-quality proteomics experiment that does not specifically analyze
nasal tissues should not expect to detect many peptides for olfactory
receptors. Neither of the studies carried out experiments on nasal
tissues, yet we found peptide evidence for more than 100 olfactory
receptors in the two studies. These results suggest that the two studies
are substantially overestimating the number of protein coding genes
they identify. We conclude that the experimental data from these two
studies should be used with caution
Most Highly Expressed Protein-Coding Genes Have a Single Dominant Isoform
Although
eukaryotic cells express a wide range of alternatively
spliced transcripts, it is not clear whether genes tend to express
a range of transcripts simultaneously across cells, or produce dominant
isoforms in a manner that is either tissue-specific or regardless
of tissue. To date, large-scale investigations into the pattern of
transcript expression across distinct tissues have produced contradictory
results. Here, we attempt to determine whether genes express a dominant
splice variant at the protein level. We interrogate peptides from
eight large-scale human proteomics experiments and databases and find
that there is a single dominant protein isoform, irrespective of tissue
or cell type, for the vast majority of the protein-coding genes in
these experiments, in partial agreement with the conclusions from
the most recent large-scale RNAseq study. Remarkably, the dominant
isoforms from the experimental proteomics analyses coincided overwhelmingly
with the reference isoforms selected by two completely orthogonal
sources, the consensus coding sequence variants, which are agreed
upon by separate manual genome curation teams, and the principal isoforms
from the APPRIS database, predicted automatically from the conservation
of protein sequence, structure, and function
General Statistical Framework for Quantitative Proteomics by Stable Isotope Labeling
The combination of stable isotope
labeling (SIL) with mass spectrometry
(MS) allows comparison of the abundance of thousands of proteins in
complex mixtures. However, interpretation of the large data sets generated
by these techniques remains a challenge because appropriate statistical
standards are lacking. Here, we present a generally applicable model
that accurately explains the behavior of data obtained using current
SIL approaches, including <sup>18</sup>O, iTRAQ, and SILAC labeling,
and different MS instruments. The model decomposes the total technical
variance into the spectral, peptide, and protein variance components,
and its general validity was demonstrated by confronting 48 experimental
distributions against 18 different null hypotheses. In addition to
its general applicability, the performance of the algorithm was at
least similar than that of other existing methods. The model also
provides a general framework to integrate quantitative and error information
fully, allowing a comparative analysis of the results obtained from
different SIL experiments. The model was applied to the global analysis
of protein alterations induced by low H<sub>2</sub>O<sub>2</sub> concentrations
in yeast, demonstrating the increased statistical power that may be
achieved by rigorous data integration. Our results highlight the importance
of establishing an adequate and validated statistical framework for
the analysis of high-throughput data