Motivation: Predictive modelling of gene expression is a powerful framework
for the in silico exploration of transcriptional regulatory interactions
through the integration of high-throughput -omics data. A major limitation of
previous approaches is their inability to handle conditional and synergistic
interactions that emerge when collectively analysing genes subject to different
regulatory mechanisms. This limitation reduces overall predictive power and
thus the reliability of downstream biological inference.
Results: We introduce an analytical modelling framework (TREEOME: tree of
models of expression) that integrates epigenetic and transcriptomic data by
separating genes into putative regulatory classes. Current predictive modelling
approaches have found both DNA methylation and histone modification epigenetic
data to provide little or no improvement in accuracy of prediction of
transcript abundance despite, for example, distinct anti-correlation between
mRNA levels and promoter-localised DNA methylation. To improve on this, in
TREEOME we evaluate four possible methods of formulating gene-level DNA
methylation metrics, which provide a foundation for identifying gene-level
methylation events and subsequent differential analysis, whereas most previous
techniques operate at the level of individual CpG dinucleotides. We demonstrate
TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone
modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript
abundance (RNA-seq) for H1-hESC and GM12878 cell lines.
Availability: TREEOME is implemented using open-source software and made
available as a pre-configured bootable reference environment. All scripts and
data presented in this study are available online at
http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure