Next-generation sequencing of primary tumors is now standard for transcriptomic studies,
but microarray-based data still constitute the majority of available information on other clinically
valuable samples, including archive material. Using prostate cancer (PC) as a model, we developed
a robust analytical framework to integrate data across different technical platforms and disease
subtypes to connect distinct disease stages and reveal potentially relevant genes not identifiable from
single studies alone. We reconstructed the molecular profile of PC to yield the first comprehensive
insight into its development, by tracking changes in mRNA levels from normal prostate to high-grade
prostatic intraepithelial neoplasia, and metastatic disease. A total of nine previously unreported
stage-specific candidate genes with prognostic significance were also found. Here, we integrate
gene expression data from disparate sample types, disease stages and technical platforms into one
coherent whole, to give a global view of the expression changes associated with the development
and progression of PC from normal tissue through to metastatic disease. Summary and individual
data are available online at the Prostate Integrative Expression Database (PIXdb), a user-friendly
interface designed for clinicians and laboratory researchers to facilitate translational research