215 research outputs found

    Methods for comparative ChIA-PET and Hi-C data analysis.

    Get PDF
    The three-dimensional architecture of chromatin in the nucleus is important for genome regulation and function. Advanced high-throughput sequencing-based methods have been developed for capturing chromatin interactions (Hi-C, genome-wide chromosome conformation capture) or enriching for those involving a specific protein (ChIA-PET, chromatin interaction analysis with paired-end tag sequencing). There is widespread interest in utilizing and interpreting ChIA-PET and Hi-C. We review methods for comparative ChIA-PET and Hi-C data analysis and visualization. The topics reviewed include: downloading ChIA-PET and Hi-C data from the ENCODE and 4DN portals; processing ChIA-PET data using ChIA-PIPE; processing Hi-C data using Juicer or distiller and cooler; viewing 2D contact maps using Juicebox or Higlass; viewing peaks, loops, and domains using BASIC Browser; annotating convergent and tandem CTCF loops

    pyBedGraph: a python package for fast operations on 1D genomic signal tracks.

    Get PDF
    MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed. RESULTS: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 s and can compute their approximate means in AVAILABILITY AND IMPLEMENTATION: pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Enhanced performance of gene expression predictive models with protein-mediated spatial chromatin interactions.

    Get PDF
    There have been multiple attempts to predict the expression of the genes based on the sequence, epigenetics, and various other factors. To improve those predictions, we have decided to investigate adding protein-specific 3D interactions that play a significant role in the condensation of the chromatin structure in the cell nucleus. To achieve this, we have used the architecture of one of the state-of-the-art algorithms, ExPecto, and investigated the changes in the model metrics upon adding the spatially relevant data. We have used ChIA-PET interactions that are mediated by cohesin (24 cell lines), CTCF (4 cell lines), and RNAPOL2 (4 cell lines). As the output of the study, we have developed the Spatial Gene Expression (SpEx) algorithm that shows statistically significant improvements in most cell lines. We have compared ourselves to the baseline ExPecto model, which obtained a 0.82 Spearman\u27s rank correlation coefficient (SCC) score, and 0.85, which is reported by newer Enformer were able to obtain the average correlation score of 0.83. However, in some cases (e.g. RNAPOL2 on GM12878), our improvement reached 0.04, and in some cases (e.g. RNAPOL2 on H1), we reached an SCC of 0.86

    MIA-Sig: multiplex chromatin interaction analysis by signal processing and statistical algorithms.

    Get PDF
    The single-molecule multiplex chromatin interaction data are generated by emerging 3D genome mapping technologies such as GAM, SPRITE, and ChIA-Drop. These datasets provide insights into high-dimensional chromatin organization, yet introduce new computational challenges. Thus, we developed MIA-Sig, an algorithmic solution based on signal processing and information theory. We demonstrate its ability to de-noise the multiplex data, assess the statistical significance of chromatin complexes, and identify topological domains and frequent inter-domain contacts. On chromatin immunoprecipitation (ChIP)-enriched data, MIA-Sig can clearly distinguish the protein-associated interactions from the non-specific topological domains. Together, MIA-Sig represents a novel algorithmic framework for multiplex chromatin interaction analysis

    Spatial chromatin architecture alteration by structural variations in human genomes at the population scale.

    Get PDF
    BACKGROUND: The number of reported examples of chromatin architecture alterations involved in the regulation of gene transcription and in disease is increasing. However, no genome-wide testing has been performed to assess the abundance of these events and their importance relative to other factors affecting genome regulation. This is particularly interesting given that a vast majority of genetic variations identified in association studies are located outside coding sequences. This study attempts to address this lack by analyzing the impact on chromatin spatial organization of genetic variants identified in individuals from 26 human populations and in genome-wide association studies. RESULTS: We assess the tendency of structural variants to accumulate in spatially interacting genomic segments and design an algorithm to model chromatin conformational changes caused by structural variations. We show that differential gene transcription is closely linked to the variation in chromatin interaction networks mediated by RNA polymerase II. We also demonstrate that CTCF-mediated interactions are well conserved across populations, but enriched with disease-associated SNPs. Moreover, we find boundaries of topological domains as relatively frequent targets of duplications, which suggest that these duplications can be an important evolutionary mechanism of genome spatial organization. CONCLUSIONS: This study assesses the critical impact of genetic variants on the higher-order organization of chromatin folding and provides insight into the mechanisms regulating gene transcription at the population scale, of which local arrangement of chromatin loops seems to be the most significant. It provides the first insight into the variability of the human 3D genome at the population scale

    Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data.

    Get PDF
    Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training

    3D-GNOME 2.0: a three-dimensional genome modeling engine for predicting structural variation-driven alterations of chromatin spatial structure in the human genome.

    Get PDF
    Structural variants (SVs) that alter DNA sequence emerge as a driving force involved in the reorganisation of DNA spatial folding, thus affecting gene transcription. In this work, we describe an improved version of our integrated web service for structural modeling of three-dimensional genome (3D-GNOME), which now incorporates all types of SVs to model changes to the reference 3D conformation of chromatin. In 3D-GNOME 2.0, the default reference 3D genome structure is generated using ChIA-PET data from the GM12878 cell line and SVs data are sourced from the population-scale catalogue of SVs identified by the 1000 Genomes Consortium. However, users may also submit their own structural data to set a customized reference genome structure, and/or a custom input list of SVs. 3D-GNOME 2.0 provides novel tools to inspect, visualize and compare 3D models for regions that differ in terms of their linear genomic sequence. Contact diagrams are displayed to compare the reference 3D structure with the one altered by SVs. In our opinion, 3D-GNOME 2.0 is a unique online tool for modeling and analyzing conformational changes to the human genome induced by SVs across populations. It can be freely accessed at https://3dgnome.cent.uw.edu.pl/

    In situ Chromatin Interaction Analysis Using Paired-End Tag Sequencing.

    Get PDF
    Chromatin Interaction Analysis Using Paired-End Tag Sequencing (ChIA-PET) is an established method to map protein-mediated chromatin interactions. A limitation, however, is that it requires a hundred million cells per experiment, which hampers its broad application in biomedical research, particularly in studies in which it is impractical to obtain a large number of cells from rare samples. To reduce the required input cell number while retaining high data quality, we developed an in situ ChIA-PET protocol, which requires as few as 1 million cells. Here, we describe detailed step-by-step procedures for performing in situ ChIA-PET from cultured cells, including both an experimental protocol for sample preparation and data generation and a computational protocol for data processing and visualization using the ChIA-PIPE pipeline. As the protocol significantly simplifies the experimental procedure, reduces ligation noise, and decreases the required input of cells compared to previous versions of ChIA-PET protocols, it can be applied to generate high-resolution chromatin contact maps mediated by various protein factors for a wide range of human and mouse primary cells. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Sample preparation and data generation Support Protocol: Bridge linker preparation Basic Protocol 2: Data processing and visualization

    Chromatin topology reorganization and transcription repression by PML-RARα in acute promyeloid leukemia.

    Get PDF
    BACKGROUND: Acute promyeloid leukemia (APL) is characterized by the oncogenic fusion protein PML-RARα, a major etiological agent in APL. However, the molecular mechanisms underlying the role of PML-RARα in leukemogenesis remain largely unknown. RESULTS: Using an inducible system, we comprehensively analyze the 3D genome organization in myeloid cells and its reorganization after PML-RARα induction and perform additional analyses in patient-derived APL cells with native PML-RARα. We discover that PML-RARα mediates extensive chromatin interactions genome-wide. Globally, it redefines the chromatin topology of the myeloid genome toward a more condensed configuration in APL cells; locally, it intrudes RNAPII-associated interaction domains, interrupts myeloid-specific transcription factors binding at enhancers and super-enhancers, and leads to transcriptional repression of genes critical for myeloid differentiation and maturation. CONCLUSIONS: Our results not only provide novel topological insights for the roles of PML-RARα in transforming myeloid cells into leukemia cells, but further uncover a topological framework of a molecular mechanism for oncogenic fusion proteins in cancers

    Multi-scale phase separation by explosive percolation with single-chromatin loop resolution.

    Get PDF
    The 2 m-long human DNA is tightly intertwined into the cell nucleus of the size of 10 μm. The DNA packing is explained by folding of chromatin fiber. This folding leads to the formation of such hierarchical structures as: chromosomal territories, compartments; densely-packed genomic regions known as Topologically Associating Domains (TADs), or Chromatin Contact Domains (CCDs), and loops. We propose models of dynamical human genome folding into hierarchical components in human lymphoblastoid, stem cell, and fibroblast cell lines. Our models are based on explosive percolation theory. The chromosomes are modeled as graphs where CTCF chromatin loops are represented as edges. The folding trajectory is simulated by gradually introducing loops to the graph following various edge addition strategies that are based on topological network properties, chromatin loop frequencies, compartmentalization, or epigenomic features. Finally, we propose the genome folding model - a biophysical pseudo-time process guided by a single scalar order parameter. The parameter is calculated by Linear Discriminant Analysis of chromatin features. We also include dynamics of loop formation by using Loop Extrusion Model (LEM) while adding them to the system. The chromatin phase separation, where fiber folds in 3D space into topological domains and compartments, is observed when the critical number of contacts is reached. We also observe that at least 80% of the loops are needed for chromatin fiber to condense in 3D space, and this is constant through various cell lines. Overall, ou
    corecore