Search CORE

875 research outputs found

Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

Author: Xia Kelin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/07/2017
Field of study

In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

FigShare

Inferring Single-Cell 3D Chromosomal Structures Based On the Lennard-Jones Potential

Author: Wang Nan
Wang Zheng
Zha Mengsheng
Zhang Chaoyang
Publication venue: The Aquila Digital Community
Publication date: 31/05/2021
Field of study

Reconstructing three‐dimensional (3D) chromosomal structures based on single‐cell Hi‐C data is a challenging scientific problem due to the extreme sparseness of the single‐cell Hi‐C data. In this research, we used the Lennard‐Jones potential to reconstruct both 500 kb and high‐resolution 50 kb chromosomal structures based on single‐cell Hi‐C data. A chromosome was represented by a string of 500 kb or 50 kb DNA beads and put into a 3D cubic lattice for simulations. A 2D Gaussian function was used to impute the sparse single‐cell Hi‐C contact matrices. We designed a novel loss function based on the Lennard‐Jones potential, in which the ε value, i.e., the well depth, was used to indicate how stable the binding of every pair of beads is. For the bead pairs that have single‐cell Hi‐C contacts and their neighboring bead pairs, the loss function assigns them stronger binding stability. The Metropolis–Hastings algorithm was used to try different locations for the DNA beads, and simulated annealing was used to optimize the loss function. We proved the correctness and validness of the reconstructed 3D structures by evaluating the models according to multiple criteria and comparing the models with 3D‐FISH data

Aquila Digital Community

PubMed Central

University of Miami: Scholarship Miami

Recommended from our members

Assessing stationary distributions derived from chromatin contact maps.

Author: Fletez-Brant Kipper
Segal Mark R
Publication venue: eScholarship, University of California
Publication date: 24/02/2020
Field of study

BACKGROUND:The spatial configuration of chromosomes is essential to various cellular processes, notably gene regulation, while architecture related alterations, such as translocations and gene fusions, are often cancer drivers. Thus, eliciting chromatin conformation is important, yet challenging due to compaction, dynamics and scale. However, a variety of recent assays, in particular Hi-C, have generated new details of chromatin structure, spawning a number of novel biological findings. Many findings have resulted from analyses on the level of native contact data as generated by the assays. Alternatively, reconstruction based approaches often proceed by first converting contact frequencies into distances, then generating a three dimensional (3D) chromatin configuration that best recapitulates these distances. Subsequent analyses can enrich contact level analyses via superposition of genomic attributes on the reconstruction. But, such advantages depend on the accuracy of the reconstruction which, absent gold standards, is inherently difficult to assess. Attempts at accuracy evaluation have relied on simulation and/or FISH imaging that typically features a handful of low resolution probes. While newly advanced multiplexed FISH imaging offers possibilities for refined 3D reconstruction accuracy evaluation, availability of such data is limited due to assay complexity and the resolution thereof is appreciably lower than the reconstructions being assessed. Accordingly, there is demand for new methods of reconstruction accuracy appraisal. RESULTS:Here we explore the potential of recently proposed stationary distributions, hereafter StatDns, derived from Hi-C contact matrices, to serve as a basis for reconstruction accuracy assessment. Current usage of such StatDns has focussed on the identification of highly interactive regions (HIRs): computationally defined regions of the genome purportedly involved in numerous long-range intra-chromosomal contacts. Consistent identification of HIRs would be informative with respect to inferred 3D architecture since the corresponding regions of the reconstruction would have an elevated number of k nearest neighbors (kNNs). More generally, we anticipate a monotone decreasing relationship between StatDn values and kNN distances. After initially evaluating the reproducibility of StatDns across replicate Hi-C data sets, we use this implied StatDn - kNN relationship to gauge the utility of StatDns for reconstruction validation, making recourse to both real and simulated examples. CONCLUSIONS:Our analyses demonstrate that, as constructed, StatDns do not provide a suitable measure for assessing the accuracy of 3D genome reconstructions. Whether this is attributable to specific choices surrounding normalization in defining StatDns or to the logic underlying their very formulation remains to be determined

eScholarship - University of California

Data mining and machine learning methods for chromosome conformation data analysis

Author: Oluwadare Oluwatosin
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.Includes bibliographical reference

University of Missouri: MOspace

EM algorithm for reconstructing 3D structures of human chromosomes from chromosomal contact data

Author: Zhang Yuxiang
Publication venue: University of Missouri--Columbia
Publication date
Field of study

Recent research suggested that chromosomes have preferred spatial conformations to facilitate necessary long-range interactions and regulations within a nucleus. So that, getting the 3D shape of chromosomes of a genome is very important for understanding how the genome folds and how the genome interact, which can know more about the secrete of life. The introduction of the chromosome conformation capture (3C) based techniques has risen the development of construct the 3D structure of chromosome model. Several works have been done to build the 3D model, among which can be divided into two groups one is consensus methods in early work, the other is ensemble method. In this paper I proposed an ensemble method for reconstructing the 3D structure of chromosome structure. First step is to process Hi-C data, and then do normalization. After that I applied the Bayesian inference model to get an objective function. Finally I used EM based algorithm along with using gradient descent method which is applied in expectation step. I applied the objective function and the optimization method to all 23 Hi-C chromosomal data at a resolution of 1MB

University of Missouri: MOspace

Reconstruction of 3D genome architecture via a two-stage algorithm

Author: Henrik L. Bengtsson
Mark R. Segal
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Analysis methods for studying the 3D architecture of the genome

Author
Publication venue: BioMed Central
Publication date: 02/09/2015
Field of study

Springer - Publisher Connector

Development of New Computational Tools for Analyzing Hi-C Data and Predicting Three-Dimensional Genome Organization

Author: MacKay Kimberly
Publication venue: 'University of Saskatchewan Library'
Publication date: 29/06/2020
Field of study

Background: The development of Hi-C (and related methods) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. There has been extensive effort in developing new tools to analyze this data in order to better understand the relationship between 3D genomic structure and function. While useful, the existing tools are far from maturity and (in some cases) lack the generalizability that would be required for application in a diverse set of organisms. This is problematic since the research community has proposed many cross-species "hallmarks" of 3D genome organization without confirming their existence in a variety of organisms. Research Objective: Develop new, generalizable computational tools for Hi-C analysis and 3D genome prediction. Results: Three new computational tools were developed for Hi-C analysis or 3D genome prediction: GrapHi-C (visualization), GeneRHi-C (3D prediction) and StoHi-C (3D prediction). Each tool has the potential to be used for 3D genome analysis in both model and non-model organisms since the underlying algorithms do not rely on any organism-specific constraints. A brief description of each tool follows. GrapHi-C is a graph-based visualization of Hi-C data. Unlike existing visualization methods, GrapHi-C allows for a more intuitive structural visualization of the underlying data. GeneRHi-C and StoHi-C are tools that can be used to predict 3D genome organizations from Hi-C data (the 3D-genome reconstruction problem). GeneRHi-C uses a combination of mixed integer programming and network layout algorithms to generate 3D coordinates from a ploidy-dependent subset of the Hi-C data. Alternatively, StoHi-C uses t-stochastic neighbour embedding with the complete set of Hi-C data to generate 3D coordinates of the genome. Each tool was applied to multiple, independent existing Hi-C datasets from fission yeast to demonstrate their utility. This is the first time 3D genome prediction has been successfully applied to these datasets. Overall, the tools developed here more clearly recapitulated documented features of fission yeast genomic organization when compared to existing techniques. Future work will focus on extending and applying these tools to analyze Hi-C datasets from other organisms. Additional Information: This thesis contains a collection of papers pertaining to the development of new tools for analyzing Hi-C data and predicting 3D genome organization. Each paper's publication status (as of January 2020) has been provided at the beginning of the corresponding chapter. For published papers, reprint permission was obtained and is available in the appendix

University of Saskatchewan Research Archive

Chromosome3D: reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing

Author: Badri Adhikari
Jianlin Cheng
Tuan Trieu
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

The two compartment features highlighted in Chromosome 1 (left) and 2 (right) in the models reconstructed by Chromosome3D (top row) and PM2 (bottom row). (DOCX 761Â kb

Springer - Publisher Connector

PubMed Central

University of Missouri, St. Louis

FigShare

SCL: A Lattice-Based Approach to Infer Three-Dimensional Chromosome Structures from Single-Cell Hi-C Data

Author: Zhu Hao
Publication venue: The Aquila Digital Community
Publication date: 01/05/2019
Field of study

In contrast to population-based Hi-C data, single-cell Hi-C data are zero-inflated and do not indicate the frequency of proximate DNA segments. There are a limited number of computational tools that can model the three-dimensional structures of chromosomes based on single-cell Hi-C data. We developed SCL (Single-Cell Lattice), a computational method to reconstruct three-dimensional (3D) structures of chromosomes based on single-cell Hi-C data. We designed a loss function and a 2D Gaussian function specifically for the characteristics of single-cell Hi-C data. A chromosome is represented as beads-on-a-string and stored in a 3D cubic lattice. Metropolis-Hastings simulation and simulated annealing are used to simulate the structure and minimize the loss function. We evaluated the SCL-inferred 3D structures (at both 500 kb and 50 kb resolutions) using multiple criteria and compared them with the ones generated by another modeling software program. The results indicate that the 3D structures generated by SCL closely fit single-cell Hi-C data. We also found similar patterns of trans-chromosomal contact beads, Lamin-B1 enriched topological domains, and H3K4me3 enriched domains by mapping data from previous studies onto the SCL-inferred 3D structures

Aquila Digital Community