Unsupervised annotation of regulatory domains by integrating functional genomic assays and Hi-C data

Abstract

In each cell type, chromosomes are organized into a specific 3D structure that controls the function of a cell through different mechanisms including domain-scale regulation. Because of the correlation between genome structure and its function, different methods have been proposed to integrate 1D functional genomic and 2D Hi-C data to identify domain types. Existing methods rely on an assumption that directly connected genomic regions are more probable to have the same domain type, however, spatial clustering of genomic regions is based on both their first-order and second-order proximities. Here, we present an integrative approach that uses 1D functional genomic features and 3D interactions from Hi-C data to assign labels to genomic regions that can discriminate both spatial and functional genomic patterns. We use graph embedding to learn latent variables for nodes (genomic regions) that preserve the Hi-C graph second-order proximity. Such latent variables summarize spatial information in Hi-C data, and we feed them in addition to existing 1D functional features to the Segway, a genome annotation method, to infer domain states. We show that our labels distinguish a combination of the spatial and functional states of the genomic regions, for example, loci locating in the nucleus interior can be furthermore clustered into significantly and moderately expressed domains. We also found the importance of each of the spatial and functional features to explain different cell activities including replication timing and gene expression profile, and how coupling two feature types improve the prediction of such activities. Finally, we showed that incorporating spatial features allow finding domain types, which are co-regulated even in large genomic distance from each other. Our framework can be generalized to aggregate different 1D genomic assays and 3D interactions from Hi-C to find the mechanisms behind the association of genome 3D structure and epigenetic profile

    Similar works