Coverage and Consistency: Bioinformatics Aspects of
the Analysis of Multirun iTRAQ Experiments with Wheat Leaves
- Publication date
- Publisher
Abstract
The hexaploid genome of bread wheat
(<i>Triticum aestivum</i>) is large (17 Gb) and repetitive,
and this has delayed full sequencing
and annotation of the genome, which is a prerequisite for effective
quantitative proteomics analysis. Aware of these constraints we investigated
the most effective approaches for shotgun proteomic analyses of bread
wheat that would support large-scale quantitative comparisons using
iTRAQ reagents. We used a data set that was generated by two-dimensional
LC–MS of iTRAQ labeled peptides from wheat leaves. The main
items considered in this study were the choice of sequence database
for matching LC–MS data, the consistency of identification
when multiple LC–MS runs were acquired, and the options for
downstream functional analysis to generate useful insight. For peptide
identification we examined the extensive NCBInr plant database, a
smaller composite cereals database, the <i>Brachypodium distachyon</i> model plant genome, the EST-based SuperWheat database, as well as
the genome sequence from the recently sequenced D-genome progenitor <i>Aegilops tauschii.</i> While the most spectra were assigned
by using the SuperWheat database, this extremely large database could
not be readily manipulated for the robust protein grouping that is
required for large-scale, multirun quantitative experiments. We demonstrated
a pragmatic alternative of using the composite cereals database for
peptide spectra matching. The stochastic aspect of protein grouping
across LC–MS runs was investigated using the smaller composite
cereals database where we found that attaching the <i>Brachypodium</i> best BLAST hit reduced this problem. Further, assigning quantitation
to the best <i>Brachypodium</i> locus yielded promising
results enabling integration with existing downstream data mining
and functional analysis tools. Our study demonstrated viable approaches
for quantitative proteomics analysis of bread wheat samples and shows
how these approaches could be similarly adopted for analysis of other
organisms with unsequenced or incompletely sequenced genomes