23 research outputs found

    Chromosome3D: reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing

    Get PDF
    The two compartment features highlighted in Chromosome 1 (left) and 2 (right) in the models reconstructed by Chromosome3D (top row) and PM2 (bottom row). (DOCX 761 kb

    Data mining and machine learning methods for chromosome conformation data analysis

    Get PDF
    Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.Includes bibliographical reference

    Development of New Computational Tools for Analyzing Hi-C Data and Predicting Three-Dimensional Genome Organization

    Get PDF
    Background: The development of Hi-C (and related methods) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. There has been extensive effort in developing new tools to analyze this data in order to better understand the relationship between 3D genomic structure and function. While useful, the existing tools are far from maturity and (in some cases) lack the generalizability that would be required for application in a diverse set of organisms. This is problematic since the research community has proposed many cross-species "hallmarks" of 3D genome organization without confirming their existence in a variety of organisms. Research Objective: Develop new, generalizable computational tools for Hi-C analysis and 3D genome prediction. Results: Three new computational tools were developed for Hi-C analysis or 3D genome prediction: GrapHi-C (visualization), GeneRHi-C (3D prediction) and StoHi-C (3D prediction). Each tool has the potential to be used for 3D genome analysis in both model and non-model organisms since the underlying algorithms do not rely on any organism-specific constraints. A brief description of each tool follows. GrapHi-C is a graph-based visualization of Hi-C data. Unlike existing visualization methods, GrapHi-C allows for a more intuitive structural visualization of the underlying data. GeneRHi-C and StoHi-C are tools that can be used to predict 3D genome organizations from Hi-C data (the 3D-genome reconstruction problem). GeneRHi-C uses a combination of mixed integer programming and network layout algorithms to generate 3D coordinates from a ploidy-dependent subset of the Hi-C data. Alternatively, StoHi-C uses t-stochastic neighbour embedding with the complete set of Hi-C data to generate 3D coordinates of the genome. Each tool was applied to multiple, independent existing Hi-C datasets from fission yeast to demonstrate their utility. This is the first time 3D genome prediction has been successfully applied to these datasets. Overall, the tools developed here more clearly recapitulated documented features of fission yeast genomic organization when compared to existing techniques. Future work will focus on extending and applying these tools to analyze Hi-C datasets from other organisms. Additional Information: This thesis contains a collection of papers pertaining to the development of new tools for analyzing Hi-C data and predicting 3D genome organization. Each paper's publication status (as of January 2020) has been provided at the beginning of the corresponding chapter. For published papers, reprint permission was obtained and is available in the appendix

    EM algorithm for reconstructing 3D structures of human chromosomes from chromosomal contact data

    Get PDF
    Recent research suggested that chromosomes have preferred spatial conformations to facilitate necessary long-range interactions and regulations within a nucleus. So that, getting the 3D shape of chromosomes of a genome is very important for understanding how the genome folds and how the genome interact, which can know more about the secrete of life. The introduction of the chromosome conformation capture (3C) based techniques has risen the development of construct the 3D structure of chromosome model. Several works have been done to build the 3D model, among which can be divided into two groups one is consensus methods in early work, the other is ensemble method. In this paper I proposed an ensemble method for reconstructing the 3D structure of chromosome structure. First step is to process Hi-C data, and then do normalization. After that I applied the Bayesian inference model to get an objective function. Finally I used EM based algorithm along with using gradient descent method which is applied in expectation step. I applied the objective function and the optimization method to all 23 Hi-C chromosomal data at a resolution of 1MB

    Spatial clustering and common regulatory elements correlate with coordinated gene expression

    Get PDF
    Many cellular responses to surrounding cues require temporally concerted transcriptional regulation of multiple genes. In prokaryotic cells, a single-input-module motif with one transcription factor regulating multiple target genes can generate coordinated gene expression. In eukaryotic cells, transcriptional activity of a gene is affected by not only transcription factors but also the epigenetic modifications and three-dimensional chromosome structure of the gene. To examine how local gene environment and transcription factor regulation are coupled, we performed a combined analysis of time-course RNA-seq data of TGF-\b{eta} treated MCF10A cells and related epigenomic and Hi-C data. Using Dynamic Regulatory Events Miner (DREM), we clustered differentially expressed genes based on gene expression profiles and associated transcription factors. Genes in each class have similar temporal gene expression patterns and share common transcription factors. Next, we defined a set of linear and radial distribution functions, as used in statistical physics, to measure the distributions of genes within a class both spatially and linearly along the genomic sequence. Remarkably, genes within the same class despite sometimes being separated by tens of million bases (Mb) along genomic sequence show a significantly higher tendency to be spatially close despite sometimes being separated by tens of Mb along the genomic sequence than those belonging to different classes do. Analyses extended to the process of mouse nervous system development arrived at similar conclusions. Future studies will be able to test whether this spatial organization of chromosomes contributes to concerted gene expression.Comment: 30 pages, 9 figures, accepted in PLoS Computational Biolog

    Cis-regulatory chromatin loops arise before TADs and gene activation, and are independent of cell fate during early Drosophila development

    Get PDF
    Acquisition of cell fate is thought to rely on the specific interaction of remote cis-regulatory modules (CRMs), for example, enhancers and target promoters. However, the precise interplay between chromatin structure and gene expression is still unclear, particularly within multicellular developing organisms. In the present study, we employ Hi-M, a single-cell spatial genomics approach, to detect CRM–promoter looping interactions within topologically associating domains (TADs) during early Drosophila development. By comparing cis-regulatory loops in alternate cell types, we show that physical proximity does not necessarily instruct transcriptional states. Moreover, multi-way analyses reveal that multiple CRMs spatially coalesce to form hubs. Loops and CRM hubs are established early during development, before the emergence of TADs. Moreover, CRM hubs are formed, in part, via the action of the pioneer transcription factor Zelda and precede transcriptional activation. Our approach provides insight into the role of CRM–promoter interactions in defining transcriptional states, as well as distinct cell types.Fil: Espínola, Sergio Martín. Centre National de la Recherche Scientifique; Francia. Institut National de la Santé et de la Recherche Médicale; Francia. Université de Montpellier. Centre de Biologie Structurale; FranciaFil: Götz, Markus. Université de Montpellier. Centre de Biologie Structurale; Francia. Centre National de la Recherche Scientifique; Francia. Institut National de la Santé et de la Recherche Médicale; FranciaFil: Bellec, Maelle. Centre National de la Recherche Scientifique; Francia. Université de Montpellier. Institut de Génétique Moléculaire de Montpellier; Francia. Université de Montpellier. Centre de Biologie Structurale; Francia. Institut National de la Santé et de la Recherche Médicale; FranciaFil: Messina, Olivier. Centre National de la Recherche Scientifique; Francia. Université de Montpellier. Institut de Génétique Moléculaire de Montpellier; Francia. Université de Montpellier. Centre de Biologie Structurale; Francia. Institut National de la Santé et de la Recherche Médicale; FranciaFil: Fiche, Jean Bernard. Université de Montpellier. Centre de Biologie Structurale; Francia. Centre National de la Recherche Scientifique; Francia. Institut National de la Santé et de la Recherche Médicale; FranciaFil: Houbron, Christophe. Université de Montpellier. Centre de Biologie Structurale; Francia. Institut National de la Santé et de la Recherche Médicale; Francia. Centre National de la Recherche Scientifique; FranciaFil: Dejean, Matthieu. Centre National de la Recherche Scientifique; Francia. Université de Montpellier. Institut de Génétique Moléculaire de Montpellier; FranciaFil: Reim, Ingolf. Universitat Erlangen Nuremberg; AlemaniaFil: Cardozo Gizzi, Andres Mauricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto Alberto C. Taquini de Investigaciones en Medicina Traslacional - Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones Cardiológicas "Prof. Dr. Alberto C. Taquini". Instituto Alberto C. Taquini de Investigaciones en Medicina Traslacional; ArgentinaFil: Lagha, Mounia. Centre National de la Recherche Scientifique; Francia. Université de Montpellier. Institut de Génétique Moléculaire de Montpellier; FranciaFil: Nollmann, Marcelo. Centre National de la Recherche Scientifique; Francia. Institut National de la Santé et de la Recherche Médicale; Francia. Université de Montpellier. Centre de Biologie Structurale; Franci

    Structural modeling of the 3D genome using machine learning

    Get PDF
    This dissertation, submitted as a partial requirement for completion of the Doctorate of Philosophy, outlines the research performed by Max Highsmith in the BDM Lab. This work includes a functional expansion of a three-dimensional genome conformation database, the development of a novel, deep-learning based strategy for the enhancement of Hi-C data, The development of deep learning approach for domain identification using epigenetic features, and the development of a novel computational tool for 4D modeling of chromosome dynamics.Includes bibliographical references
    corecore