560 research outputs found
Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps
Continuous improvements to high-throughput conformation capture (Hi-C) are revealing richerinformation about the spatial organization of the chromatin and its role in cellular functions.Several studies have confirmed the existence of structural features of the genome 3D organiza-tion that are stable across cell types and conserved across species, calledtopological associatingdomains(TADs). The detection of TADs has become a critical step in the analysis of Hi-C data,e.g., to identify enhancer-promoter associations. Here we presentEast, a novel TAD identifi-cation algorithm based on fast 2D convolution of Haar-like features, that is as accurate as thestate-of-the-art method based on the directionality index, but 75-80x faster.Eastis availablein the public domain at https://github.com/ucrbioinfo/EAST
Methods for developing a machine learning framework for precise 3D domain boundary prediction at base-level resolution
High-throughput chromosome conformation capture technology (Hi-C) has revealed extensive DNA looping and folding into discrete 3D domains. These include Topologically Associating Domains (TADs) and chromatin loops, the 3D domains critical for cellular processes like gene regulation and cell differentiation. The relatively low resolution of Hi-C data (regions of several kilobases in size) prevents precise mapping of domain boundaries by conventional TAD/loop-callers. However, high resolution genomic annotations associated with boundaries, such as CTCF and members of cohesin complex, suggest a computational approach for precise location of domain boundaries.
We developed preciseTAD, an optimized machine learning framework that leverages a random forest model to improve the location of domain boundaries. Our method introduces three concepts - shifted binning, distance-type predictors, and random under-sampling - which we use to build classification models for predicting boundary regions. The algorithm then uses density-based clustering (DBSCAN) and partitioning around medoids (PAM) to extract the most biologically meaningful domain boundary from models trained on high-resolution genome annotation data and boundaries from low-resolution Hi-C data. We benchmarked our method against a popular TAD-caller and a novel chromatin loop prediction algorithm.
Boundaries predicted by preciseTAD were more enriched for known molecular drivers of 3D chromatin including CTCF, RAD21, SMC3, and ZNF143. preciseTAD-predicted boundaries were more conserved across cell lines, highlighting their higher biological significance. Additionally, models pre-trained in one cell line accurately predict boundaries in another cell line. Using cell line-specific genomic annotations, the pre-trained models enable detecting domain boundaries in cells without Hi-C data.
The research presented provides a unified approach for precisely predicting domain boundaries. This improved precision will provide insight into the association between genomic regulators and the 3D genome organization. Furthermore, our methods will provide researchers with flexible and easy-to-use tools to continue to annotate the 3D structure of the human genome without relying on costly high resolution Hi-C data. The preciseTAD R package and supplementary ExperimentHub package, preciseTADhub, are available on Bioconductor (version 3.13; https://bioconductor.org/packages/preciseTAD/; https://bioconductor.org/packages/preciseTADhub/)
Determining cellular CTCF and cohesin abundances to constrain 3D genome models.
Achieving a quantitative and predictive understanding of 3D genome architecture remains a major challenge, as it requires quantitative measurements of the key proteins involved. Here, we report the quantification of CTCF and cohesin, two causal regulators of topologically associating domains (TADs) in mammalian cells. Extending our previous imaging studies (Hansen et al., 2017), we estimate bounds on the density of putatively DNA loop-extruding cohesin complexes and CTCF binding site occupancy. Furthermore, co-immunoprecipitation studies of an endogenously tagged subunit (Rad21) suggest the presence of cohesin dimers and/or oligomers. Finally, based on our cell lines with accurately measured protein abundances, we report a method to conveniently determine the number of molecules of any Halo-tagged protein in the cell. We anticipate that our results and the established tool for measuring cellular protein abundances will advance a more quantitative understanding of 3D genome organization, and facilitate protein quantification, key to comprehend diverse biological processes
3D Organization of Eukaryotic and Prokaryotic Genomes
There is a complex mutual interplay between three-dimensional (3D) genome organization and cellular activities in bacteria and eukaryotes. The aim of this thesis is to investigate such structure-function relationships.
A main part of this thesis deals with the study of the three-dimensional genome organization using novel techniques for detecting genome-wide contacts using next-generation sequencing. These so called chromatin conformation capture-based methods, such as 5C and Hi-C, give deep insights into the architecture of the genome inside the nucleus, even on a small scale. We shed light on the question how the vastly increasing Hi-C data can generate new insights about the way the genome is organized in 3D.
To this end, we first present the typical Hi-C data processing workflow to obtain Hi-C contact maps and show potential pitfalls in the interpretation of such contact maps using our own data pipeline and publicly available Hi-C data sets. Subsequently, we focus on approaches to modeling 3D genome organization based on contact maps. In this context, a computational tool was developed which interactively visualizes contact maps alongside complementary genomic data tracks. Inspired by machine learning with the help of probabilistic graphical models, we developed a tool that detects the compartmentalization structure within contact maps on multiple scales. In a further project, we propose and test one possible mechanism for the observed compartmentalization within contact maps of genomes across multiple species: Dynamic formation of loops within domains.
In the context of 3D organization of bacterial chromosomes, we present the first direct evidence for global restructuring by long-range interactions of a DNA binding protein. Using Hi-C and live cell imaging of DNA loci, we show that the DNA binding protein Rok forms insulator-like complexes looping the B. subtilis genome over large distances. This biological mechanism agrees with our model based on dynamic formation of loops affecting domain formation in eukaryotic genomes. We further investigate the spatial segregation of the E. coli chromosome during cell division. In particular, we are interested in the positioning of the chromosomal replication origin region based on its interaction with the protein complex MukBEF. We tackle the problem using a combined approach of stochastic and polymer simulations.
Last but not least, we develop a completely new methodology to analyze single molecule localization microscopy images based on topological data analysis. By using this new approach in the analysis of irradiated cells, we are able to show that the topology of repair foci can be categorized depending the distance to heterochromatin
Marchantia TCP transcription factor activity correlates with three-dimensional chromatin structure
Informationen des Genoms werden nicht nur mit der Sequenz oder epigenetischen
Modifikation codiert, sondern auch in ihrer Faltung im 3D-Raum gefunden. JuÌngste
Entwicklungen bei der Konformationserfassung von Chromosomen ermöglichten es
uns, die rĂ€umliche Positionierung des Genoms in verschiedenen MaĂstĂ€ben
aufzudecken. Die Bildung selbstinteragierender Genomregionen, die als
Topologically Associated Domains (TADs) bezeichnet werden, wird von Hi-C als
SchluÌsselmerkmal der Genomorganisation jenseits der Nukleosomenebene entdeckt.
Jedes TAD ist eine isolierte lokale Packungseinheit, in der Intra-TADWechselwirkungen
bevorzugt und Inter-TAD-Wechselwirkungen isoliert werden.
Bei Tieren wird gezeigt, dass mehrere Architekturproteine zur Struktur und Funktion
der tierischen TADs beitragen. Im Gegensatz zu Tieren sind TAD-Bildung, -
Funktion und -Proteine, die bei diesen Prozessen in Pflanzen eine Rolle spielen, eher
unbekannt.
Unsere vorlÀufige Hi-C-Analyse zeigte, dass das Genom von Marchantia polymorpha,
einem Mitglied einer basalen Landpflanzenlinie, eine evolutionÀr konservierte 3DLandschaft
mit dem höheren Pflanzen teilt. Das Marchantia-Genom ist in Hunderte
von TADs unterteilt und ihre Grenzen sind mit der TCP1-Proteinbindung
verbunden. Eine genomweite epigenetische Analyse ergab, dass ein betrÀchtlicher
Teil der Marchantia-TADs interstitielles Heterochromatin darstellt und mit
repressiven epigenetischen Markierungen verziert ist. Wir identifizieren auch einen
neuartigen TAD-Typ, den wir TCP1-reiches TAD nennen, bei dem genomische
Regionen gut zugÀnglich und durch TCP1-Proteine dicht gebunden sind. TCP1-
gebundene Gene, die sich in TCP1-reichen TADs befinden, weisen im Vergleich zu
TCP1-gebundenen Genen an anderen Stellen niedrigere Genexpressionsniveaus auf.
In tcp1-Mutanten Ànderten sich die TAD-Muster in der Hi-C-Karte nicht, was darauf
hinweist, dass das TCP1-Protein fuÌr die TAD-Bildung und -Struktur nicht wesentlich
ist. Wir stellen jedoch fest, dass in tcp1-Mutanten Gene, die in TCP1-reichen TADs
leben, eine gröĂere VerĂ€nderung der Expressionsfalte aufweisen als Gene, die nicht zu diesen TADs gehören. Unsere Ergebnisse zeigen, dass Pflanzen-TADs nicht nur
als rÀumliche Chromatin-Packungsmodule stehen, sondern auch als nukleare
Mikrokompartimente fungieren, die die AktivitÀten des Transkriptionsfaktors
korrelieren.Information of the genome is not only encoded to its sequence or epigenetic
modifications but also found in its folding in 3D space. Recent developments in
Chromosome Conformation Capture techniques enabled us to unveil spatial
positioning of the genome at different scales. The formation of self-interacting
genomic regions, named Topologically Associated Domains (TADs), are discovered
by Hi-C, as a key feature of genome organization beyond the nucleosomal level. Each
TAD is an isolated local packing unit in which intra-TAD interactions are favoured
and inter-TAD interactions are insulated. In animals several architectural proteins are
shown to contribute the structure and the function of the animal TADs. Unlike those
in animals, TAD formation, function and proteins that play a role in these processes
in plants are rather unknown.
Our Hi-C analyses show that the genome of Marchantia polymorpha, a member of a
basal land plant lineage, shares an evolutionary conserved 3D landscape with that of
higher plants. The Marchantia genome is subdivided into hundreds of TADs and their
borders are associated with TCP1 protein binding. Genome-wide epigenetic analysis
reveals that a considerable fraction of Marchantia TADs represent interstitial
heterochromatin and are decorated with repressive epigenetic marks. We also identify
a novel type of TAD that we name TCP1-rich TAD, in which genomic regions are
highly accessible and densely bound by TCP1 proteins. TCP1-bound genes residing
in TCP1-rich TADs exhibit lower gene expression levels compared to the TCP1-
bound genes in other locations.
In tcp1 mutants, TAD patterns in the Hi-C map do not change, indicating that TCP1
protein is not essential for TAD formation and structure. However, we find that in
tcp1 mutants, genes residing in TCP1-rich TADs have a greater extent in expression
fold change compared to genes not belonging to these TADs. Our results indicate
that, besides standing as spatial chromatin packing modules, plant TADs function as
nuclear micro-compartments that correlate transcription factor activities
- âŠ