We present an applied study in cancer genomics for integrating data and
inferences from laboratory experiments on cancer cell lines with observational
data obtained from human breast cancer studies. The biological focus is on
improving understanding of transcriptional responses of tumors to changes in
the pH level of the cellular microenvironment. The statistical focus is on
connecting experimentally defined biomarkers of such responses to clinical
outcome in observational studies of breast cancer patients. Our analysis
exemplifies a general strategy for accomplishing this kind of integration
across contexts. The statistical methodologies employed here draw heavily on
Bayesian sparse factor models for identifying, modularizing and correlating
with clinical outcome these signatures of aggregate changes in gene expression.
By projecting patterns of biological response linked to specific experimental
interventions into observational studies where such responses may be evidenced
via variation in gene expression across samples, we are able to define
biomarkers of clinically relevant physiological states and outcomes that are
rooted in the biology of the original experiment. Through this approach we
identify microenvironment-related prognostic factors capable of predicting long
term survival in two independent breast cancer datasets. These results suggest
possible directions for future laboratory studies, as well as indicate the
potential for therapeutic advances though targeted disruption of specific
pathway components.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS261 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org