Integrative Analysis of Modular Structure of Genes in High-throughput Tumor Profiles

Abstract

Cellular functions, such as signal transduction, transportation, cell cycle, and various metabolism, require cooperation of many gene products. Following the central dogma, such large-scale cooperation within and across cells often leave traces on different omics profiles. One major clue would be the strong correlation among genes in genomics, epigenetics, transcriptomics, and proteomics. Based on this premise, we started to identify functional modules by integrating pairwise correlation among genes from different information sources into the form of multiplex networks. Although all the layers of the multiplex shared the same protein interactome as the skeleton, edge weights in each layer represents pairwise correlation from a different type of information sources. This formation allows information flow from one data source to another. We also designed a novel graph clustering algorithm to detect gene sets with strong correlations inside. However, the multiplex integration only yields marginal improvement against single omics. We turn to the mutual exclusivity patterns in cancer genomics. This pattern suggests that a single somatic alteration event may be sufficient to promote tumorigenesis. We pushed the assumption further to state that disruption of a single pathway could lead to differential expression of a large set of genes, which is supported by our work on Boolean matrix factorization. Then we proposed the OR-gate network (ORN) to model the causal mechanism from somatic alterations to transcriptomics. Results showed that it is able to recover the heterogeneity among cancer samples and functional modules responsible for certain dysregulation in cancer transcriptomics. Still, ORN has two major limitations. One is the issue of co-amplification. ORN cannot distinguish passengers in the same copy number variation hotspot as the drivers. To this end, we applied the word2vec model to extract gene embedding from biomedical literature. Another issue is the transcriptional regulation module may not be accurate. To this end, we developed a novel algorithm (peak2vec) to uncover transcriptional motif patterns and coregulation from the chromatic accessibility profiles. In the future, we will integrate gene embedding and peak2vec into the ORN framework to better understand the causal impact of somatic alteration as functional modules

    Similar works

    Full text

    thumbnail-image

    Available Versions