1,052 research outputs found
Recommended from our members
The Rabl configuration limits topological entanglement of chromosomes in budding yeast.
The three dimensional organization of genomes remains mostly unknown due to their high degree of condensation. Biophysical studies predict that condensation promotes the topological entanglement of chromatin fibers and the inhibition of function. How organisms balance between functionally active genomes and a high degree of condensation remains to be determined. Here we hypothesize that the Rabl configuration, characterized by the attachment of centromeres and telomeres to the nuclear envelope, helps to reduce the topological entanglement of chromosomes. To test this hypothesis we developed a novel method to quantify chromosome entanglement complexity in 3D reconstructions obtained from Chromosome Conformation Capture (CCC) data. Applying this method to published data of the yeast genome, we show that computational models implementing the attachment of telomeres or centromeres alone are not sufficient to obtain the reduced entanglement complexity observed in 3D reconstructions. It is only when the centromeres and telomeres are attached to the nuclear envelope (i.e. the Rabl configuration) that the complexity of entanglement of the genome is comparable to that of the 3D reconstructions. We therefore suggest that the Rabl configuration is an essential player in the simplification of the entanglement of chromatin fibers
GAMIBHEAR: whole-genome haplotype reconstruction from genome architecture mapping data
Motivation: Understanding haplotype-specific regulatory mechanisms becomes increasingly important in genomics and medical research. Investigating differences in allele-specific gene expression, epigenetic changes and their causal variants greatly benefits from haplotype reconstruction or phasing of genetic variants, but direct evidence for the haplotype structure is difficult to obtain from standard short-read sequencing data. Chromatin conformation data obtained from 3C experiments allows inference of haplotypes because inter-chromosomal contacts are more frequent than homologous intra-chromosomal contacts, but these data suffer from technical biases owing to the digestion and ligation process of the 3C technique. Genome Architecture Mapping (GAM) is a novel digestion- and ligation-free method for the inference of chromatin conformation from nuclear cryosections. Due to its high resolution and independence of enzymatic digestion it is well-suited for haplotype reconstruction and for detecting haplotype-specific chromatin contacts. Results: Here, we present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies across multiple nuclear slices and employs a GAM-specific probabilistic model of haplotype capture to optimise phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAM data and the accuracy of GAMIBHEAR to infer genome-wide haplotypes. Availability: GAMIBHEAR is available as an R package under the open source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear
Maintainer: julia.markowski{at}mdc-berlin.d
Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression.
While genetic variation at chromatin loops is relevant for human disease, the relationships between contact propensity (the probability that loci at loops physically interact), genetics, and gene regulation are unclear. We quantitatively interrogate these relationships by comparing Hi-C and molecular phenotype data across cell types and haplotypes. While chromatin loops consistently form across different cell types, they have subtle quantitative differences in contact frequency that are associated with larger changes in gene expression and H3K27ac. For the vast majority of loci with quantitative differences in contact frequency across haplotypes, the changes in magnitude are smaller than those across cell types; however, the proportional relationships between contact propensity, gene expression, and H3K27ac are consistent. These findings suggest that subtle changes in contact propensity have a biologically meaningful role in gene regulation and could be a mechanism by which regulatory genetic variants in loop anchors mediate effects on expression
Recommended from our members
Common DNA sequence variation influences 3-dimensional conformation of the human genome.
BACKGROUND:The 3-dimensional (3D) conformation of chromatin inside the nucleus is integral to a variety of nuclear processes including transcriptional regulation, DNA replication, and DNA damage repair. Aberrations in 3D chromatin conformation have been implicated in developmental abnormalities and cancer. Despite the importance of 3D chromatin conformation to cellular function and human health, little is known about how 3D chromatin conformation varies in the human population, or whether DNA sequence variation between individuals influences 3D chromatin conformation. RESULTS:To address these questions, we perform Hi-C on lymphoblastoid cell lines from 20 individuals. We identify thousands of regions across the genome where 3D chromatin conformation varies between individuals and find that this variation is often accompanied by variation in gene expression, histone modifications, and transcription factor binding. Moreover, we find that DNA sequence variation influences several features of 3D chromatin conformation including loop strength, contact insulation, contact directionality, and density of local cis contacts. We map hundreds of quantitative trait loci associated with 3D chromatin features and find evidence that some of these same variants are associated at modest levels with other molecular phenotypes as well as complex disease risk. CONCLUSION:Our results demonstrate that common DNA sequence variants can influence 3D chromatin conformation, pointing to a more pervasive role for 3D chromatin conformation in human phenotypic variation than previously recognized
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
Data mining and machine learning methods for chromosome conformation data analysis
Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.Includes bibliographical reference
Development of New Computational Tools for Analyzing Hi-C Data and Predicting Three-Dimensional Genome Organization
Background: The development of Hi-C (and related methods) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. There has been extensive effort in developing new tools to analyze this data in order to better understand the relationship between 3D genomic structure and function. While useful, the existing tools are far from maturity and (in some cases) lack the generalizability that would be required for application in a diverse set of organisms. This is problematic since the research community has proposed many cross-species "hallmarks" of 3D genome organization without confirming their existence in a variety of organisms.
Research Objective: Develop new, generalizable computational tools for Hi-C analysis and 3D genome prediction.
Results: Three new computational tools were developed for Hi-C analysis or 3D genome prediction: GrapHi-C (visualization), GeneRHi-C (3D prediction) and StoHi-C (3D prediction). Each tool has the potential to be used for 3D genome analysis in both model and non-model organisms since the underlying algorithms do not rely on any organism-specific constraints. A brief description of each tool follows. GrapHi-C is a graph-based visualization of Hi-C data. Unlike existing visualization methods, GrapHi-C allows for a more intuitive structural visualization of the underlying data. GeneRHi-C and StoHi-C are tools that can be used to predict 3D genome organizations from Hi-C data (the 3D-genome reconstruction problem). GeneRHi-C uses a combination of mixed integer programming and network layout algorithms to generate 3D coordinates from a ploidy-dependent subset of the Hi-C data. Alternatively, StoHi-C uses t-stochastic neighbour embedding with the complete set of Hi-C data to generate 3D coordinates of the genome. Each tool was applied to multiple, independent existing Hi-C datasets from fission yeast to demonstrate their utility. This is the first time 3D genome prediction has been successfully applied to these datasets. Overall, the tools developed here more clearly recapitulated documented features of fission yeast genomic organization when compared to existing techniques. Future work will focus on extending and applying these tools to analyze Hi-C datasets from other organisms.
Additional Information: This thesis contains a collection of papers pertaining to the development of new tools for analyzing Hi-C data and predicting 3D genome organization. Each paper's publication status (as of January 2020) has been provided at the beginning of the corresponding chapter. For published papers, reprint permission was obtained and is available in the appendix
3D Organization of Eukaryotic and Prokaryotic Genomes
There is a complex mutual interplay between three-dimensional (3D) genome organization and cellular activities in bacteria and eukaryotes. The aim of this thesis is to investigate such structure-function relationships.
A main part of this thesis deals with the study of the three-dimensional genome organization using novel techniques for detecting genome-wide contacts using next-generation sequencing. These so called chromatin conformation capture-based methods, such as 5C and Hi-C, give deep insights into the architecture of the genome inside the nucleus, even on a small scale. We shed light on the question how the vastly increasing Hi-C data can generate new insights about the way the genome is organized in 3D.
To this end, we first present the typical Hi-C data processing workflow to obtain Hi-C contact maps and show potential pitfalls in the interpretation of such contact maps using our own data pipeline and publicly available Hi-C data sets. Subsequently, we focus on approaches to modeling 3D genome organization based on contact maps. In this context, a computational tool was developed which interactively visualizes contact maps alongside complementary genomic data tracks. Inspired by machine learning with the help of probabilistic graphical models, we developed a tool that detects the compartmentalization structure within contact maps on multiple scales. In a further project, we propose and test one possible mechanism for the observed compartmentalization within contact maps of genomes across multiple species: Dynamic formation of loops within domains.
In the context of 3D organization of bacterial chromosomes, we present the first direct evidence for global restructuring by long-range interactions of a DNA binding protein. Using Hi-C and live cell imaging of DNA loci, we show that the DNA binding protein Rok forms insulator-like complexes looping the B. subtilis genome over large distances. This biological mechanism agrees with our model based on dynamic formation of loops affecting domain formation in eukaryotic genomes. We further investigate the spatial segregation of the E. coli chromosome during cell division. In particular, we are interested in the positioning of the chromosomal replication origin region based on its interaction with the protein complex MukBEF. We tackle the problem using a combined approach of stochastic and polymer simulations.
Last but not least, we develop a completely new methodology to analyze single molecule localization microscopy images based on topological data analysis. By using this new approach in the analysis of irradiated cells, we are able to show that the topology of repair foci can be categorized depending the distance to heterochromatin
- …