1,430 research outputs found

    An Algorithm for Cellular Reprogramming

    Full text link
    The day we understand the time evolution of subcellular elements at a level of detail comparable to physical systems governed by Newton's laws of motion seems far away. Even so, quantitative approaches to cellular dynamics add to our understanding of cell biology, providing data-guided frameworks that allow us to develop better predictions about and methods for control over specific biological processes and system-wide cell behavior. In this paper we describe an approach to optimizing the use of transcription factors in the context of cellular reprogramming. We construct an approximate model for the natural evolution of a synchronized population of fibroblasts, based on data obtained by sampling the expression of some 22,083 genes at several times along the cell cycle. (These data are based on a colony of cells that have been cell cycle synchronized) In order to arrive at a model of moderate complexity, we cluster gene expression based on the division of the genome into topologically associating domains (TADs) and then model the dynamics of the expression levels of the TADs. Based on this dynamical model and known bioinformatics, we develop a methodology for identifying the transcription factors that are the most likely to be effective toward a specific cellular reprogramming task. The approach used is based on a device commonly used in optimal control. From this data-guided methodology, we identify a number of validated transcription factors used in reprogramming and/or natural differentiation. Our findings highlight the immense potential of dynamical models models, mathematics, and data guided methodologies for improving methods for control over biological processes

    Unsupervised annotation of regulatory domains by integrating functional genomic assays and Hi-C data

    Get PDF
    In each cell type, chromosomes are organized into a specific 3D structure that controls the function of a cell through different mechanisms including domain-scale regulation. Because of the correlation between genome structure and its function, different methods have been proposed to integrate 1D functional genomic and 2D Hi-C data to identify domain types. Existing methods rely on an assumption that directly connected genomic regions are more probable to have the same domain type, however, spatial clustering of genomic regions is based on both their first-order and second-order proximities. Here, we present an integrative approach that uses 1D functional genomic features and 3D interactions from Hi-C data to assign labels to genomic regions that can discriminate both spatial and functional genomic patterns. We use graph embedding to learn latent variables for nodes (genomic regions) that preserve the Hi-C graph second-order proximity. Such latent variables summarize spatial information in Hi-C data, and we feed them in addition to existing 1D functional features to the Segway, a genome annotation method, to infer domain states. We show that our labels distinguish a combination of the spatial and functional states of the genomic regions, for example, loci locating in the nucleus interior can be furthermore clustered into significantly and moderately expressed domains. We also found the importance of each of the spatial and functional features to explain different cell activities including replication timing and gene expression profile, and how coupling two feature types improve the prediction of such activities. Finally, we showed that incorporating spatial features allow finding domain types, which are co-regulated even in large genomic distance from each other. Our framework can be generalized to aggregate different 1D genomic assays and 3D interactions from Hi-C to find the mechanisms behind the association of genome 3D structure and epigenetic profile

    CaTCHing the functional and structural properties of chromosome folding

    Get PDF
    Proper development requires that genes are expressed at the right time, in the right tissue, and at the right transcriptional level. In metazoans, this involves long-range cis-regulatory elements such as enhancers, which can be located up to hundreds of kilobases away from their target promoters. How enhancers find their target genes and avoid aberrant interactions with non-target genes is currently under intense investigations. The predominant model for enhancer function involves its direct physical looping between the enhancer and target promoter. The three-dimensional organization of chromatin, which accommodates promoter- enhancer interactions, therefore might play an important role in the specificity of these interactions. In the last decade, the development of a class of techniques called chromosome conformation capture (3C) and its derivatives have revolutionized the field of chromatin folding. In particular, the genome-wide version of 3C, Hi-C, revealed that mammalian chromosomes possess a rich hierarchy of folding layers, from multi-megabase compartments corresponding to mutually exclusive associations of active and inactive chromatin to topologically associating domains (TADs), which reflect regions with preferential internal interactions. Although the mechanisms that give rise to this hierarchy are still poorly understood, there is increasing evidence to suggest that TADs represent fundamental functional units for establishing the correct pattern of enhancer-promoter interactions. This is thought to occur through two complementary mechanisms: on the one hand, TADs are thought to increase the chances that regulatory elements meet each other by confining them within the same domain; on the other hand, by segregation of physical interactions across the boundary to avoid unwanted events to occur frequently. It is however unclear whether the properties that have been attributed to TADs are specific to TADs, or rather common features among the whole hierarchy. To address this question, I have implemented an algorithm named Caller of Topological Chromosomal Hierarchies (CaTCH). CaTCH is able to detect nested hierarchies of domains, allowing a comprehensive analysis of structural and functional properties across the folding hierarchy. By applying CaTCH to published Hi-C data in mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs), I showed that TADs emerge as a functionally privileged scale. In particular, TADs appear to be the scale where accumulation of CTCF at domain boundaries and transcriptional co-regulation during differentiation is maximal. Moreover, TADs appear to be the folding scale where the partitioning of interactions within transcriptionally active domains (and notably between active enhancers and promoters) is optimized. 3C-based methods have enabled fundamental discoveries such as the existence of TADs and CTCF-mediated chromatin loops. 3C methods detect chromatin interactions as ligation products after crosslinking the DNA. Crosslinking and ligation have been often criticized as potential sources of experimental biases, raising the question of whether TADs and CTCF- mediated chromatin loops actually exist in living cells. To address this, in collaboration with Josef Redolfi, we developed a new method termed ‘DamC’ which combines DNA methylation with physical modeling to detect chromosomal interactions in living cells, at the molecular scale, without relying on crosslinking and ligation. By applying DamC to mouse ESCs, we provide the first in vivo and crosslinking- and ligation-free validation of chromosomal structures detected by 3C-methods, namely TADs and CTCF-mediated chromatin loops. DamC, together with 3C-based methods, thus have shown that mammalian chromosomes possess a rich hierarchy of folding layers. An important challenge in the field is to understand the mechanisms that drive the establishment these folding layers. In this sense, polymer physics represent a powerful tool to gain mechanistic insights into the hierarchical folding of mammalian chromosomes. In polymer models, the scaling of contact probability, i.e. the contact probability as a function of genomic distance, has been often used to benchmark polymer simulations and test alternative models. However, the scaling of contact probability is only one of the many properties that characterize polymer models raising the question of whether it would be enough to discriminate alternative polymer models. To address this, I have built finite-size heteropolymer models characterized by random interactions. I showed that finite-size effects, together with the heterogeneity of the interactions, are sufficient to reproduce the observed range of scaling of contact probability. This suggests that one should be careful in discriminating polymer models of chromatin folding based solely on the scaling. In conclusion, my findings have contributed to achieve a better understanding of chromatin folding, which is essential to really understand how enhancers act on promoters. The comprehensive analyses using CaTCH have provided conceptually new insights into how the architectural functionality of TADs may be established. My work on heteropolymer models has highlighted the fact that one should be careful in using solely scaling to discriminate physical models for chromatin folding. Finally, the ability to detect TADs and chromatin loops using DamC represents a fundamental result since it provides the first orthogonal in vivo validation of chromosomal structures that had essentially relied on a single technology

    Deciphering hierarchical organization of topologically associated domains through change-point testing.

    Get PDF
    BACKGROUND: The nucleus of eukaryotic cells spatially packages chromosomes into a hierarchical and distinct segregation that plays critical roles in maintaining transcription regulation. High-throughput methods of chromosome conformation capture, such as Hi-C, have revealed topologically associating domains (TADs) that are defined by biased chromatin interactions within them. RESULTS: We introduce a novel method, HiCKey, to decipher hierarchical TAD structures in Hi-C data and compare them across samples. We first derive a generalized likelihood-ratio (GLR) test for detecting change-points in an interaction matrix that follows a negative binomial distribution or general mixture distribution. We then employ several optimal search strategies to decipher hierarchical TADs with p values calculated by the GLR test. Large-scale validations of simulation data show that HiCKey has good precision in recalling known TADs and is robust against random collisions of chromatin interactions. By applying HiCKey to Hi-C data of seven human cell lines, we identified multiple layers of TAD organization among them, but the vast majority had no more than four layers. In particular, we found that TAD boundaries are significantly enriched in active chromosomal regions compared to repressed regions. CONCLUSIONS: HiCKey is optimized for processing large matrices constructed from high-resolution Hi-C experiments. The method and theoretical result of the GLR test provide a general framework for significance testing of similar experimental chromatin interaction data that may not fully follow negative binomial distributions but rather more general mixture distributions

    Measuring chromosomal interactions in living cells

    Get PDF
    3C based high-throughput sequencing methods such as Hi-C, 5C and 4C have substantially contributed to our current understanding of genome folding. These techniques have been instrumental in demonstrating that mammalian chromosomes possess a rich hierarchy of structural layers at the heart of which topologically associating domains (TADs) stand out as preferential functional units in the genome. TADs have been suggested to establish the correct interaction patterns between regulatory sequences, supported by genetic studies where the deletion of boundary elements resulted in ectopic gene expression in the neighboring domain. Within TADs, looping interactions occur between regulatory sequences and convergent binding sites of the architectural protein CTCF, the latter as a consequence of loop extrusion by cohesin that is blocked by CTCF bound to DNA in a defined orientation. The dominant role of CTCF in loop formation is further highlighted by induced depletion experiments and targeted deletions and inversion of CTCF sites manifesting in loss of these interactions. Despite these fundamental discoveries and their implications for transcriptional control by cis-regulatory sequences, 3C and derivatives are based on formaldehyde crosslinking and ligation, which have been often criticized as a source of important experimental bias. This has actually raised the question if structures detected by 3C methods do really exist in living cells. Based on discrepancies between 5C and DNA-FISH data, it was suggested that 3C based methods might not always capture spatial proximity or molecular-scale interactions, but rather detect DNA fragments which are hundreds of nanometers apart through crosslinking of macromolecular protein complexes between them. At the same time, it was debated whether capturing of ligation products might be variable depending on sequence context, therefore over- or underrepresenting some interactions detected in 3C based methods. Even though several other methods including native 4C/Hi-C, GAM and SPRITE have also detected chromatin compartmentalization, TADs and looping interactions, they still involve substantial biochemical manipulation of cells, notably either crosslinking or ligation. Importantly, many mechanistic models of chromosome folding rely on 3C based data, making the assumption that crosslinking frequency is proportional to absolute contact frequency. However, a formal proof of this is still missing. In order to measure chromosomal contacts directly in living cells, without using chemical fixation nor ligation, I developed an alternative approach based on the DamID technique that exploits detection of ectopic adenine methylation by the bacterial methyltransferase Dam. In the original version of DamID, Dam is fused to a DNA binding protein of interest resulting in adenine methylation within GATC motifs in the neighborhood of the DNA binding sites. The methylation-sensitive restriction enzyme DpnI is then used to detect methylated GATCs followed by high throughput sequencing of the restriction sites. After mapping the reads and normalizing for non-specific methylation by freely diffusing Dam, the binding sites of the protein of interest can be detected genome wide. I established a new modified version of this technique called DamC, where Dam is recruited in an inducible way to ectopically inserted Tet operators through fusion to the reverse tetracycline receptor. The detection of methylated DNA by high-throughput sequencing then allows to identify chromosomal contacts at high genomic resolution across hundreds of kilobases around viewpoints. Importantly, modeling of this process provides a theoretical framework showing that the experimental output of DamC is indeed proportional to chromosomal contact probabilities. DamC provides the first crosslinking- and ligation-free validation of key structural features of mammalian chromosomes identified by 3C methods. It confirms the existence of TADs and CTCF loops as well as the scaling of contact probabilities measured in 4C and Hi-C, which supports the validity of physical models of chromosome folding based on 3C-based data. Finally, it demonstrates that ectopic insertion of CTCF sites can lead to the formation of new loops with endogenous CTCF-bound sequences. This shows that chromosome structure can be engineered by inserting short ectopic sequences that rewire interactions within TADs, opening interesting avenues for modifying gene expression by altering chromosomal interactions rather than regulatory DNA sequences directly

    SCL: A Lattice-Based Approach To Infer 3D Chromosome Structures From Single-Cell Hi-C Data

    Get PDF
    Motivation: In contrast to population-based Hi-C data, single-cell Hi-C data are zero-inflated and do not indicate the frequency of proximate DNA segments. There are a limited number of computational tools that can model the 3D structures of chromosomes based on single-cell Hi-C data. Results: We developed single-cell lattice (SCL), a computational method to reconstruct 3D structures of chromosomes based on single-cell Hi-C data. We designed a loss function and a 2 D Gaussian function specifically for the characteristics of single-cell Hi-C data. A chromosome is represented as beads-on-a-string and stored in a 3 D cubic lattice. Metropolis–Hastings simulation and simulated annealing are used to simulate the structure and minimize the loss function. We evaluated the SCL-inferred 3 D structures (at both 500 and 50 kb resolutions) using multiple criteria and compared them with the ones generated by another modeling software program. The results indicate that the 3 D structures generated by SCL closely fit single-cell Hi-C data. We also found similar patterns of trans-chromosomal contact beads, Lamin-B1 enriched topologically associating domains (TADs), and H3K4me3 enriched TADs by mapping data from previous studies onto the SCL-inferred 3 D structures. Availability and Implementation: The C++ source code of SCL is freely available at http://dna.cs.miami.edu/SCL/

    Theoretical analysis of the role of chromatin interactions in long-range action of enhancers and insulators

    Get PDF
    Long-distance regulatory interactions between enhancers and their target genes are commonplace in higher eukaryotes. Interposed boundaries or insulators are able to block these long distance regulatory interactions. The mechanistic basis for insulator activity and how it relates to enhancer action-at-a-distance remains unclear. Here we explore the idea that topological loops could simultaneously account for regulatory interactions of distal enhancers and the insulating activity of boundary elements. We show that while loop formation is not in itself sufficient to explain action at a distance, incorporating transient non-specific and moderate attractive interactions between the chromatin fibers strongly enhances long-distance regulatory interactions and is sufficient to generate a euchromatin-like state. Under these same conditions, the subdivision of the loop into two topologically independent loops by insulators inhibits inter-domain interactions. The underlying cause of this effect is a suppression of crossings in the contact map at intermediate distances. Thus our model simultaneously accounts for regulatory interactions at a distance and the insulator activity of boundary elements. This unified model of the regulatory roles of chromatin loops makes several testable predictions that could be confronted with \emph{in vitro} experiments, as well as genomic chromatin conformation capture and fluorescent microscopic approaches.Comment: 10 pages, originally submitted to an (undisclosed) journal in May 201

    Reciprocal insulation analysis of Hi-C data shows that TADs represent a functionally but not structurally privileged scale in the hierarchical folding of chromosomes

    Get PDF
    Understanding how regulatory sequences interact in the context of chromosomal architecture is a central challenge in biology. Chromosome conformation capture revealed that mammalian chromosomes possess a rich hierarchy of structural layers, from multi-megabase compartments to sub-megabase topologically associating domains (TADs) and sub-TAD contact domains. TADs appear to act as regulatory microenvironments by constraining and segregating regulatory interactions across discrete chromosomal regions. However, it is unclear whether other (or all) folding layers share similar properties, or rather TADs constitute a privileged folding scale with maximal impact on the organization of regulatory interactions. Here, we present a novel algorithm named CaTCH that identifies hierarchical trees of chromosomal domains in Hi-C maps, stratified through their reciprocal physical insulation, which is a single and biologically relevant parameter. By applying CaTCH to published Hi-C data sets, we show that previously reported folding layers appear at different insulation levels. We demonstrate that although no structurally privileged folding level exists, TADs emerge as a functionally privileged scale defined by maximal boundary enrichment in CTCF and maximal cell-type conservation. By measuring transcriptional output in embryonic stem cells and neural precursor cells, we show that the likelihood that genes in a domain are coregulated during differentiation is also maximized at the scale of TADs. Finally, we observe that regulatory sequences occur at genomic locations corresponding to optimized mutual interactions at the same scale. Our analysis suggests that the architectural functionality of TADs arises from the interplay between their ability to partition interactions and the specific genomic position of regulatory sequences
    • …
    corecore