869 research outputs found
Recommended from our members
Assessing stationary distributions derived from chromatin contact maps.
BACKGROUND:The spatial configuration of chromosomes is essential to various cellular processes, notably gene regulation, while architecture related alterations, such as translocations and gene fusions, are often cancer drivers. Thus, eliciting chromatin conformation is important, yet challenging due to compaction, dynamics and scale. However, a variety of recent assays, in particular Hi-C, have generated new details of chromatin structure, spawning a number of novel biological findings. Many findings have resulted from analyses on the level of native contact data as generated by the assays. Alternatively, reconstruction based approaches often proceed by first converting contact frequencies into distances, then generating a three dimensional (3D) chromatin configuration that best recapitulates these distances. Subsequent analyses can enrich contact level analyses via superposition of genomic attributes on the reconstruction. But, such advantages depend on the accuracy of the reconstruction which, absent gold standards, is inherently difficult to assess. Attempts at accuracy evaluation have relied on simulation and/or FISH imaging that typically features a handful of low resolution probes. While newly advanced multiplexed FISH imaging offers possibilities for refined 3D reconstruction accuracy evaluation, availability of such data is limited due to assay complexity and the resolution thereof is appreciably lower than the reconstructions being assessed. Accordingly, there is demand for new methods of reconstruction accuracy appraisal. RESULTS:Here we explore the potential of recently proposed stationary distributions, hereafter StatDns, derived from Hi-C contact matrices, to serve as a basis for reconstruction accuracy assessment. Current usage of such StatDns has focussed on the identification of highly interactive regions (HIRs): computationally defined regions of the genome purportedly involved in numerous long-range intra-chromosomal contacts. Consistent identification of HIRs would be informative with respect to inferred 3D architecture since the corresponding regions of the reconstruction would have an elevated number of k nearest neighbors (kNNs). More generally, we anticipate a monotone decreasing relationship between StatDn values and kNN distances. After initially evaluating the reproducibility of StatDns across replicate Hi-C data sets, we use this implied StatDn - kNN relationship to gauge the utility of StatDns for reconstruction validation, making recourse to both real and simulated examples. CONCLUSIONS:Our analyses demonstrate that, as constructed, StatDns do not provide a suitable measure for assessing the accuracy of 3D genome reconstructions. Whether this is attributable to specific choices surrounding normalization in defining StatDns or to the logic underlying their very formulation remains to be determined
Structural modeling of the 3D genome using machine learning
This dissertation, submitted as a partial requirement for completion of the Doctorate of Philosophy, outlines the research performed by Max Highsmith in the BDM Lab. This work includes a functional expansion of a three-dimensional genome conformation database, the development of a novel, deep-learning based strategy for the enhancement of Hi-C data, The development of deep learning approach for domain identification using epigenetic features, and the development of a novel computational tool for 4D modeling of chromosome dynamics.Includes bibliographical references
Chromosome3D: reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing
The two compartment features highlighted in Chromosome 1 (left) and 2 (right) in the models reconstructed by Chromosome3D (top row) and PM2 (bottom row). (DOCX 761Â kb
MiOS, an integrated imaging and computational strategy to model gene folding with nucleosome resolution
The linear sequence of DNA provides invaluable information about genes and their regulatory elements along chromosomes. However, to fully understand gene function and regulation, we need to dissect how genes physically fold in the three-dimensional nuclear space. Here we describe immuno-OligoSTORM, an imaging strategy that reveals the distribution of nucleosomes within specific genes in super-resolution, through the simultaneous visualization of DNA and histones. We combine immuno-OligoSTORM with restraint-based and coarse-grained modeling approaches to integrate super-resolution imaging data with Hi-C contact frequencies and deconvoluted micrococcal nuclease-sequencing information. The resulting method, called Modeling immuno-OligoSTORM, allows quantitative modeling of genes with nucleosome resolution and provides information about chromatin accessibility for regulatory factors, such as RNA polymerase II. With Modeling immuno-OligoSTORM, we explore intercellular variability, transcriptional-dependent gene conformation, and folding of housekeeping and pluripotency-related genes in human pluripotent and differentiated cells, thereby obtaining the highest degree of data integration achieved so far to our knowledge
Data mining and machine learning methods for chromosome conformation data analysis
Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.Includes bibliographical reference
- …