Understanding co-expressed gene sets by identifying regulators and modeling genomic elements

Abstract

Genomic researchers commonly study complex phenotypes by identifying experimentally derived sets of functionally related genes with similar transcriptional profiles. These gene sets are then frequently subjected to statistical tests of association relating them to previously characterized gene sets from literature and public databases. However, few tools exist examining the non-coding, regulatory sequence of gene sets for evidence of a shared regulatory signature that may signal the involvement of important DNA-binding proteins called transcription factors (TFs). Here, we proposed and developed new computational methods for identifying major regulatory features of co-expressed gene sets that incorporate TF-DNA binding specificities (“motifs”) with other important features such as sequence conservation and chromatin structure. We additionally demonstrated a novel approach for discovering regulatory signatures that are shared across gene sets from multiple experimental conditions or tissues. Given the co-expressed genes of a particular cell type, we also attempted to annotate their specific regulatory sequences (“enhancers”) by constructing models of enhancer activity that incorporate the expression and binding specificities of the relevant transcription factors. We first developed and tested these models in well-characterized cell types, and then evaluated the extent to which these models were applicable using only minimal experimental evidence to poorly characterized systems without known transcriptional regulators and functional enhancers. Finally, we developed a network-based algorithm for examining novel gene sets that integrates many diverse types of biological evidences and relationships to better discover functionally related genes. This novel approach processed a comprehensive, heterogeneous network of biological knowledge and ranked genes and molecular properties represented in the network for their relevance to the given set of co-expressed genes

    Similar works