560 research outputs found

    Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps

    Get PDF
    Continuous improvements to high-throughput conformation capture (Hi-C) are revealing richerinformation about the spatial organization of the chromatin and its role in cellular functions.Several studies have confirmed the existence of structural features of the genome 3D organiza-tion that are stable across cell types and conserved across species, calledtopological associatingdomains(TADs). The detection of TADs has become a critical step in the analysis of Hi-C data,e.g., to identify enhancer-promoter associations. Here we presentEast, a novel TAD identifi-cation algorithm based on fast 2D convolution of Haar-like features, that is as accurate as thestate-of-the-art method based on the directionality index, but 75-80x faster.Eastis availablein the public domain at https://github.com/ucrbioinfo/EAST

    Methods for developing a machine learning framework for precise 3D domain boundary prediction at base-level resolution

    Get PDF
    High-throughput chromosome conformation capture technology (Hi-C) has revealed extensive DNA looping and folding into discrete 3D domains. These include Topologically Associating Domains (TADs) and chromatin loops, the 3D domains critical for cellular processes like gene regulation and cell differentiation. The relatively low resolution of Hi-C data (regions of several kilobases in size) prevents precise mapping of domain boundaries by conventional TAD/loop-callers. However, high resolution genomic annotations associated with boundaries, such as CTCF and members of cohesin complex, suggest a computational approach for precise location of domain boundaries. We developed preciseTAD, an optimized machine learning framework that leverages a random forest model to improve the location of domain boundaries. Our method introduces three concepts - shifted binning, distance-type predictors, and random under-sampling - which we use to build classification models for predicting boundary regions. The algorithm then uses density-based clustering (DBSCAN) and partitioning around medoids (PAM) to extract the most biologically meaningful domain boundary from models trained on high-resolution genome annotation data and boundaries from low-resolution Hi-C data. We benchmarked our method against a popular TAD-caller and a novel chromatin loop prediction algorithm. Boundaries predicted by preciseTAD were more enriched for known molecular drivers of 3D chromatin including CTCF, RAD21, SMC3, and ZNF143. preciseTAD-predicted boundaries were more conserved across cell lines, highlighting their higher biological significance. Additionally, models pre-trained in one cell line accurately predict boundaries in another cell line. Using cell line-specific genomic annotations, the pre-trained models enable detecting domain boundaries in cells without Hi-C data. The research presented provides a unified approach for precisely predicting domain boundaries. This improved precision will provide insight into the association between genomic regulators and the 3D genome organization. Furthermore, our methods will provide researchers with flexible and easy-to-use tools to continue to annotate the 3D structure of the human genome without relying on costly high resolution Hi-C data. The preciseTAD R package and supplementary ExperimentHub package, preciseTADhub, are available on Bioconductor (version 3.13; https://bioconductor.org/packages/preciseTAD/; https://bioconductor.org/packages/preciseTADhub/)

    Determining cellular CTCF and cohesin abundances to constrain 3D genome models.

    Get PDF
    Achieving a quantitative and predictive understanding of 3D genome architecture remains a major challenge, as it requires quantitative measurements of the key proteins involved. Here, we report the quantification of CTCF and cohesin, two causal regulators of topologically associating domains (TADs) in mammalian cells. Extending our previous imaging studies (Hansen et al., 2017), we estimate bounds on the density of putatively DNA loop-extruding cohesin complexes and CTCF binding site occupancy. Furthermore, co-immunoprecipitation studies of an endogenously tagged subunit (Rad21) suggest the presence of cohesin dimers and/or oligomers. Finally, based on our cell lines with accurately measured protein abundances, we report a method to conveniently determine the number of molecules of any Halo-tagged protein in the cell. We anticipate that our results and the established tool for measuring cellular protein abundances will advance a more quantitative understanding of 3D genome organization, and facilitate protein quantification, key to comprehend diverse biological processes

    3D Organization of Eukaryotic and Prokaryotic Genomes

    Get PDF
    There is a complex mutual interplay between three-dimensional (3D) genome organization and cellular activities in bacteria and eukaryotes. The aim of this thesis is to investigate such structure-function relationships. A main part of this thesis deals with the study of the three-dimensional genome organization using novel techniques for detecting genome-wide contacts using next-generation sequencing. These so called chromatin conformation capture-based methods, such as 5C and Hi-C, give deep insights into the architecture of the genome inside the nucleus, even on a small scale. We shed light on the question how the vastly increasing Hi-C data can generate new insights about the way the genome is organized in 3D. To this end, we first present the typical Hi-C data processing workflow to obtain Hi-C contact maps and show potential pitfalls in the interpretation of such contact maps using our own data pipeline and publicly available Hi-C data sets. Subsequently, we focus on approaches to modeling 3D genome organization based on contact maps. In this context, a computational tool was developed which interactively visualizes contact maps alongside complementary genomic data tracks. Inspired by machine learning with the help of probabilistic graphical models, we developed a tool that detects the compartmentalization structure within contact maps on multiple scales. In a further project, we propose and test one possible mechanism for the observed compartmentalization within contact maps of genomes across multiple species: Dynamic formation of loops within domains. In the context of 3D organization of bacterial chromosomes, we present the first direct evidence for global restructuring by long-range interactions of a DNA binding protein. Using Hi-C and live cell imaging of DNA loci, we show that the DNA binding protein Rok forms insulator-like complexes looping the B. subtilis genome over large distances. This biological mechanism agrees with our model based on dynamic formation of loops affecting domain formation in eukaryotic genomes. We further investigate the spatial segregation of the E. coli chromosome during cell division. In particular, we are interested in the positioning of the chromosomal replication origin region based on its interaction with the protein complex MukBEF. We tackle the problem using a combined approach of stochastic and polymer simulations. Last but not least, we develop a completely new methodology to analyze single molecule localization microscopy images based on topological data analysis. By using this new approach in the analysis of irradiated cells, we are able to show that the topology of repair foci can be categorized depending the distance to heterochromatin

    Marchantia TCP transcription factor activity correlates with three-dimensional chromatin structure

    Get PDF
    Informationen des Genoms werden nicht nur mit der Sequenz oder epigenetischen Modifikation codiert, sondern auch in ihrer Faltung im 3D-Raum gefunden. Jüngste Entwicklungen bei der Konformationserfassung von Chromosomen ermöglichten es uns, die rĂ€umliche Positionierung des Genoms in verschiedenen MaßstĂ€ben aufzudecken. Die Bildung selbstinteragierender Genomregionen, die als Topologically Associated Domains (TADs) bezeichnet werden, wird von Hi-C als Schlüsselmerkmal der Genomorganisation jenseits der Nukleosomenebene entdeckt. Jedes TAD ist eine isolierte lokale Packungseinheit, in der Intra-TADWechselwirkungen bevorzugt und Inter-TAD-Wechselwirkungen isoliert werden. Bei Tieren wird gezeigt, dass mehrere Architekturproteine zur Struktur und Funktion der tierischen TADs beitragen. Im Gegensatz zu Tieren sind TAD-Bildung, - Funktion und -Proteine, die bei diesen Prozessen in Pflanzen eine Rolle spielen, eher unbekannt. Unsere vorlĂ€ufige Hi-C-Analyse zeigte, dass das Genom von Marchantia polymorpha, einem Mitglied einer basalen Landpflanzenlinie, eine evolutionĂ€r konservierte 3DLandschaft mit dem höheren Pflanzen teilt. Das Marchantia-Genom ist in Hunderte von TADs unterteilt und ihre Grenzen sind mit der TCP1-Proteinbindung verbunden. Eine genomweite epigenetische Analyse ergab, dass ein betrĂ€chtlicher Teil der Marchantia-TADs interstitielles Heterochromatin darstellt und mit repressiven epigenetischen Markierungen verziert ist. Wir identifizieren auch einen neuartigen TAD-Typ, den wir TCP1-reiches TAD nennen, bei dem genomische Regionen gut zugĂ€nglich und durch TCP1-Proteine dicht gebunden sind. TCP1- gebundene Gene, die sich in TCP1-reichen TADs befinden, weisen im Vergleich zu TCP1-gebundenen Genen an anderen Stellen niedrigere Genexpressionsniveaus auf. In tcp1-Mutanten Ă€nderten sich die TAD-Muster in der Hi-C-Karte nicht, was darauf hinweist, dass das TCP1-Protein für die TAD-Bildung und -Struktur nicht wesentlich ist. Wir stellen jedoch fest, dass in tcp1-Mutanten Gene, die in TCP1-reichen TADs leben, eine grĂ¶ĂŸere VerĂ€nderung der Expressionsfalte aufweisen als Gene, die nicht zu diesen TADs gehören. Unsere Ergebnisse zeigen, dass Pflanzen-TADs nicht nur als rĂ€umliche Chromatin-Packungsmodule stehen, sondern auch als nukleare Mikrokompartimente fungieren, die die AktivitĂ€ten des Transkriptionsfaktors korrelieren.Information of the genome is not only encoded to its sequence or epigenetic modifications but also found in its folding in 3D space. Recent developments in Chromosome Conformation Capture techniques enabled us to unveil spatial positioning of the genome at different scales. The formation of self-interacting genomic regions, named Topologically Associated Domains (TADs), are discovered by Hi-C, as a key feature of genome organization beyond the nucleosomal level. Each TAD is an isolated local packing unit in which intra-TAD interactions are favoured and inter-TAD interactions are insulated. In animals several architectural proteins are shown to contribute the structure and the function of the animal TADs. Unlike those in animals, TAD formation, function and proteins that play a role in these processes in plants are rather unknown. Our Hi-C analyses show that the genome of Marchantia polymorpha, a member of a basal land plant lineage, shares an evolutionary conserved 3D landscape with that of higher plants. The Marchantia genome is subdivided into hundreds of TADs and their borders are associated with TCP1 protein binding. Genome-wide epigenetic analysis reveals that a considerable fraction of Marchantia TADs represent interstitial heterochromatin and are decorated with repressive epigenetic marks. We also identify a novel type of TAD that we name TCP1-rich TAD, in which genomic regions are highly accessible and densely bound by TCP1 proteins. TCP1-bound genes residing in TCP1-rich TADs exhibit lower gene expression levels compared to the TCP1- bound genes in other locations. In tcp1 mutants, TAD patterns in the Hi-C map do not change, indicating that TCP1 protein is not essential for TAD formation and structure. However, we find that in tcp1 mutants, genes residing in TCP1-rich TADs have a greater extent in expression fold change compared to genes not belonging to these TADs. Our results indicate that, besides standing as spatial chromatin packing modules, plant TADs function as nuclear micro-compartments that correlate transcription factor activities
    • 

    corecore