314 research outputs found

    Inferring Diploid 3D Chromatin Structures from Hi-C Data

    Get PDF
    The 3D organization of the genome plays a key role in many cellular processes, such as gene regulation, differentiation, and replication. Assays like Hi-C measure DNA-DNA contacts in a high-throughput fashion, and inferring accurate 3D models of chromosomes can yield insights hidden in the raw data. For example, structural inference can account for noise in the data, disambiguate the distinct structures of homologous chromosomes, orient genomic regions relative to nuclear landmarks, and serve as a framework for integrating other data types. Although many methods exist to infer the 3D structure of haploid genomes, inferring a diploid structure from Hi-C data is still an open problem. Indeed, the diploid case is very challenging, because Hi-C data typically does not distinguish between homologous chromosomes. We propose a method to infer 3D diploid genomes from Hi-C data. We demonstrate the accuracy of the method on simulated data, and we also use the method to infer 3D structures for mouse chromosome X, confirming that the active homolog exhibits a bipartite structure, whereas the active homolog does not

    3D Genome Reconstruction from Partially Phased Hi-C Data.

    Get PDF
    The 3-dimensional (3D) structure of the genome is of significant importance for many cellular processes. In this paper, we study the problem of reconstructing the 3D structure of chromosomes from Hi-C data of diploid organisms, which poses additional challenges compared to the better-studied haploid setting. With the help of techniques from algebraic geometry, we prove that a small amount of phased data is sufficient to ensure finite identifiability, both for noiseless and noisy data. In the light of these results, we propose a new 3D reconstruction method based on semidefinite programming, paired with numerical algebraic geometry and local optimization. The performance of this method is tested on several simulated datasets under different noise levels and with different amounts of phased data. We also apply it to a real dataset from mouse X chromosomes, and we are then able to recover previously known structural features

    Inferring Single-Cell 3D Chromosomal Structures Based On the Lennard-Jones Potential

    Get PDF
    Reconstructing threeā€dimensional (3D) chromosomal structures based on singleā€cell Hiā€C data is a challenging scientific problem due to the extreme sparseness of the singleā€cell Hiā€C data. In this research, we used the Lennardā€Jones potential to reconstruct both 500 kb and highā€resolution 50 kb chromosomal structures based on singleā€cell Hiā€C data. A chromosome was represented by a string of 500 kb or 50 kb DNA beads and put into a 3D cubic lattice for simulations. A 2D Gaussian function was used to impute the sparse singleā€cell Hiā€C contact matrices. We designed a novel loss function based on the Lennardā€Jones potential, in which the Īµ value, i.e., the well depth, was used to indicate how stable the binding of every pair of beads is. For the bead pairs that have singleā€cell Hiā€C contacts and their neighboring bead pairs, the loss function assigns them stronger binding stability. The Metropolisā€“Hastings algorithm was used to try different locations for the DNA beads, and simulated annealing was used to optimize the loss function. We proved the correctness and validness of the reconstructed 3D structures by evaluating the models according to multiple criteria and comparing the models with 3Dā€FISH data

    Hi-C-constrained physical models of human chromosomes recover functionally-related properties of genome organization

    Get PDF
    Combining genome-wide structural models with phenomenological data is at the forefront of efforts to understand the organizational principles regulating the human genome. Here, we use chromosome-chromosome contact data as knowledge-based constraints for large-scale three-dimensional models of the human diploid genome. The resulting models remain minimally entangled and acquire several functional features that are observed in vivo and that were never used as input for the model. We find, for instance, that gene-rich, active regions are drawn towards the nuclear center, while gene poor and lamina associated domains are pushed to the periphery. These and other properties persist upon adding local contact constraints, suggesting their compatibility with non-local constraints for the genome organization. The results show that suitable combinations of data analysis and physical modelling can expose the unexpectedly rich functionally-related properties implicit in chromosome-chromosome contact data. Specific directions are suggested for further developments based on combining experimental data analysis and genomic structural modelling

    SCL: A Lattice-Based Approach to Infer Three-Dimensional Chromosome Structures from Single-Cell Hi-C Data

    Get PDF
    In contrast to population-based Hi-C data, single-cell Hi-C data are zero-inflated and do not indicate the frequency of proximate DNA segments. There are a limited number of computational tools that can model the three-dimensional structures of chromosomes based on single-cell Hi-C data. We developed SCL (Single-Cell Lattice), a computational method to reconstruct three-dimensional (3D) structures of chromosomes based on single-cell Hi-C data. We designed a loss function and a 2D Gaussian function specifically for the characteristics of single-cell Hi-C data. A chromosome is represented as beads-on-a-string and stored in a 3D cubic lattice. Metropolis-Hastings simulation and simulated annealing are used to simulate the structure and minimize the loss function. We evaluated the SCL-inferred 3D structures (at both 500 kb and 50 kb resolutions) using multiple criteria and compared them with the ones generated by another modeling software program. The results indicate that the 3D structures generated by SCL closely fit single-cell Hi-C data. We also found similar patterns of trans-chromosomal contact beads, Lamin-B1 enriched topological domains, and H3K4me3 enriched domains by mapping data from previous studies onto the SCL-inferred 3D structures

    Development of New Computational Tools for Analyzing Hi-C Data and Predicting Three-Dimensional Genome Organization

    Get PDF
    Background: The development of Hi-C (and related methods) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. There has been extensive effort in developing new tools to analyze this data in order to better understand the relationship between 3D genomic structure and function. While useful, the existing tools are far from maturity and (in some cases) lack the generalizability that would be required for application in a diverse set of organisms. This is problematic since the research community has proposed many cross-species "hallmarks" of 3D genome organization without confirming their existence in a variety of organisms. Research Objective: Develop new, generalizable computational tools for Hi-C analysis and 3D genome prediction. Results: Three new computational tools were developed for Hi-C analysis or 3D genome prediction: GrapHi-C (visualization), GeneRHi-C (3D prediction) and StoHi-C (3D prediction). Each tool has the potential to be used for 3D genome analysis in both model and non-model organisms since the underlying algorithms do not rely on any organism-specific constraints. A brief description of each tool follows. GrapHi-C is a graph-based visualization of Hi-C data. Unlike existing visualization methods, GrapHi-C allows for a more intuitive structural visualization of the underlying data. GeneRHi-C and StoHi-C are tools that can be used to predict 3D genome organizations from Hi-C data (the 3D-genome reconstruction problem). GeneRHi-C uses a combination of mixed integer programming and network layout algorithms to generate 3D coordinates from a ploidy-dependent subset of the Hi-C data. Alternatively, StoHi-C uses t-stochastic neighbour embedding with the complete set of Hi-C data to generate 3D coordinates of the genome. Each tool was applied to multiple, independent existing Hi-C datasets from fission yeast to demonstrate their utility. This is the first time 3D genome prediction has been successfully applied to these datasets. Overall, the tools developed here more clearly recapitulated documented features of fission yeast genomic organization when compared to existing techniques. Future work will focus on extending and applying these tools to analyze Hi-C datasets from other organisms. Additional Information: This thesis contains a collection of papers pertaining to the development of new tools for analyzing Hi-C data and predicting 3D genome organization. Each paper's publication status (as of January 2020) has been provided at the beginning of the corresponding chapter. For published papers, reprint permission was obtained and is available in the appendix
    • ā€¦