2,551 research outputs found

    Enhanced performance of gene expression predictive models with protein-mediated spatial chromatin interactions.

    Get PDF
    There have been multiple attempts to predict the expression of the genes based on the sequence, epigenetics, and various other factors. To improve those predictions, we have decided to investigate adding protein-specific 3D interactions that play a significant role in the condensation of the chromatin structure in the cell nucleus. To achieve this, we have used the architecture of one of the state-of-the-art algorithms, ExPecto, and investigated the changes in the model metrics upon adding the spatially relevant data. We have used ChIA-PET interactions that are mediated by cohesin (24 cell lines), CTCF (4 cell lines), and RNAPOL2 (4 cell lines). As the output of the study, we have developed the Spatial Gene Expression (SpEx) algorithm that shows statistically significant improvements in most cell lines. We have compared ourselves to the baseline ExPecto model, which obtained a 0.82 Spearman\u27s rank correlation coefficient (SCC) score, and 0.85, which is reported by newer Enformer were able to obtain the average correlation score of 0.83. However, in some cases (e.g. RNAPOL2 on GM12878), our improvement reached 0.04, and in some cases (e.g. RNAPOL2 on H1), we reached an SCC of 0.86

    Super-resolution visualization of chromatin loop folding in human lymphoblastoid cells using interferometric photoactivated localization microscopy.

    Get PDF
    The three-dimensional (3D) genome structure plays a fundamental role in gene regulation and cellular functions. Recent studies in 3D genomics inferred the very basic functional chromatin folding structures known as chromatin loops, the long-range chromatin interactions that are mediated by protein factors and dynamically extruded by cohesin. We combined the use of FISH staining of a very short (33 kb) chromatin fragment, interferometric photoactivated localization microscopy (iPALM), and traveling salesman problem-based heuristic loop reconstruction algorithm from an image of the one of the strongest CTCF-mediated chromatin loops in human lymphoblastoid cells. In total, we have generated thirteen good quality images of the target chromatin region with 2-22 nm oligo probe localization precision. We visualized the shape of the single chromatin loops with unprecedented genomic resolution which allowed us to study the structural heterogeneity of chromatin looping. We were able to compare the physical distance maps from all reconstructed image-driven computational models with contact frequencies observed by ChIA-PET and Hi-C genomic-driven methods to examine the concordance between single cell imaging and population based genomic data

    Investigate Genomic 3D Structure Using Deep Neural Network

    Get PDF
    The 3D structures of the chromosomes play fundamental roles in essential cellular functions, e.g., gene regulation, gene expression, evolution and Hi-C technique provides the interaction density between loci on chromosomes. In this dissertation, we developed multiple algorithms, focusing the deep learning approach, to study the Hi-C datasets and the genomic 3D structures. Building 3D structure of the genome one of the most critical purpose of the Hi-C technique. Recently, several approaches have been developed to reconstruct the 3D model of the chromosomes from HiC data. However, all of the methods are based on a particular mathematical model and lack of flexibility for new development.We introduce a novel approach using the genetic algorithm. Our approach is flexible to accept any mathematical models to build a 3D chromosomal structure. Also, our approach outperforms current techniques in accuracy. Although an increasing number of Hi-C datasets have been generated in a variety of tissue/cell types, Due to high sequencing cost, the resolution of most Hi-C datasets are coarse and cannot be used to infer important biological functions (e.g., enhancerpromoter interactions, and link disease-related non-coding variants to their target genes). To address this challenge, we develop HiCPlus, a computational approach based on deep convolutional neural network, to infer high-resolution Hi-C interaction matrices from low-resolution Hi-C data. Through extensive testing, we demonstrate that HiCPlus can impute interaction matrices highly similar to original ones while using only as few as 1/16 of the total sequencing reads. We observe that Hi-C interaction matrix contains unique local features that are consistent across di!erent cell types, and such features can be e!ectively captured by the deep learning framework. We further apply HiCPlus to enhance and expand the usability of Hi-C datasets in a variety of tissue and cell types. In summary, our work not only provides a framework to generate high-resolution Hi-C matrix with a fraction of the sequencing cost but also reveals features underlying the formation of 3D chromatin interactions. The noise level in the Hi-C is high, and the structure of the noise is complicated. Also, even under most strict experimental conditions, the absolute noise-free Hi-C data still cannot be obtained. We proposed a novel approach to learn a denoising network without clean data. Our approach employs Siamese structure, utilizing two replicates of the same experimental settings to train the model; the resulting model can then be applied to datasets where only one replicate is available. We applied our new approach to enhance Hi-C data, an important type of data in exploring threedimensional genomic structures. The results prove that the model trained by our method significantly reduce the noise level in Hi-C data. In the past few years, we have seen an explosion of Hi-C data in a variety of cell/tissue types. While these publicly available data presents an unprecedented opportunity to interrogate chromosomal architecture, how to quantitatively compare Hi-C data from di!erent tissues and identify tissue-specific chromatin interactions remains challenging. We developed HiCComp, a comprehensive framework for comparing Hi-C data. HiCComp utilizes convolutional neural networks to extract key features in Hi-C interaction matrices in a fully automatic way. The core component of HiCComp is a triplet network, which contains three identical convolutional neural networks with shared parameters. The inputs to our network are three Hi-C matrices: two of them are biological replicates from the same cell type, and the third one is from another cell type. The HiCComp network takes advantages of the two biological replicates to estimate the natural variation in the experiments and further use it to identify significant variations between Hi-C matrices from di!erent cell types. Furthermore, we incorporate systematic occluding method into our framework so that we can identify the dynamic interaction regions from Hi-C maps. Finally, we show that the dynamic regions between two cell types are enriched for transcription factor binding sites and histone modifications that are associated with cis-regulatory functions, suggesting these variations in 3D genome structure are potentially gene regulatory events

    Integrative construction of regulatory region networks in 127 human reference epigenomes by matrix factorization

    Get PDF
    © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. Despite large experimental and computational efforts aiming to dissect the mechanisms underlying disease risk, mapping cis-regulatory elements to target genes remains a challenge. Here, we introduce a matrix factorization framework to integrate physical and functional interaction data of genomic segments. The framework was used to predict a regulatory network of chromatin interaction edges linking more than 20 000 promoters and 1.8 million enhancers across 127 human reference epigenomes, including edges that are present in any of the input datasets. Our network integrates functional evidence of correlated activity patterns from epigenomic data and physical evidence of chromatin interactions. An important contribution of this work is the representation of heterogeneous data with different qualities as networks. We show that the unbiased integration of independent data sources suggestive of regulatory interactions produces meaningful associations supported by existing functional and physical evidence, correlating with expected independent biological features

    Identifying noncoding risk variants using disease-relevant gene regulatory networks.

    Get PDF
    Identifying noncoding risk variants remains a challenging task. Because noncoding variants exert their effects in the context of a gene regulatory network (GRN), we hypothesize that explicit use of disease-relevant GRNs can significantly improve the inference accuracy of noncoding risk variants. We describe Annotation of Regulatory Variants using Integrated Networks (ARVIN), a general computational framework for predicting causal noncoding variants. It employs a set of novel regulatory network-based features, combined with sequence-based features to infer noncoding risk variants. Using known causal variants in gene promoters and enhancers in a number of diseases, we show ARVIN outperforms state-of-the-art methods that use sequence-based features alone. Additional experimental validation using reporter assay further demonstrates the accuracy of ARVIN. Application of ARVIN to seven autoimmune diseases provides a holistic view of the gene subnetwork perturbed by the combinatorial action of the entire set of risk noncoding mutations. Nat Commun 2018 Feb 16; 9(1):702
    corecore