Search CORE

7,228 research outputs found

Algorithms for reconstruction of chromosomal structures

Author
Publication venue: BioMed Central
Publication date: 19/01/2016
Field of study

Springer - Publisher Connector

Recommended from our members

Assessing stationary distributions derived from chromatin contact maps.

Author: Fletez-Brant Kipper
Segal Mark R
Publication venue: eScholarship, University of California
Publication date: 24/02/2020
Field of study

BACKGROUND:The spatial configuration of chromosomes is essential to various cellular processes, notably gene regulation, while architecture related alterations, such as translocations and gene fusions, are often cancer drivers. Thus, eliciting chromatin conformation is important, yet challenging due to compaction, dynamics and scale. However, a variety of recent assays, in particular Hi-C, have generated new details of chromatin structure, spawning a number of novel biological findings. Many findings have resulted from analyses on the level of native contact data as generated by the assays. Alternatively, reconstruction based approaches often proceed by first converting contact frequencies into distances, then generating a three dimensional (3D) chromatin configuration that best recapitulates these distances. Subsequent analyses can enrich contact level analyses via superposition of genomic attributes on the reconstruction. But, such advantages depend on the accuracy of the reconstruction which, absent gold standards, is inherently difficult to assess. Attempts at accuracy evaluation have relied on simulation and/or FISH imaging that typically features a handful of low resolution probes. While newly advanced multiplexed FISH imaging offers possibilities for refined 3D reconstruction accuracy evaluation, availability of such data is limited due to assay complexity and the resolution thereof is appreciably lower than the reconstructions being assessed. Accordingly, there is demand for new methods of reconstruction accuracy appraisal. RESULTS:Here we explore the potential of recently proposed stationary distributions, hereafter StatDns, derived from Hi-C contact matrices, to serve as a basis for reconstruction accuracy assessment. Current usage of such StatDns has focussed on the identification of highly interactive regions (HIRs): computationally defined regions of the genome purportedly involved in numerous long-range intra-chromosomal contacts. Consistent identification of HIRs would be informative with respect to inferred 3D architecture since the corresponding regions of the reconstruction would have an elevated number of k nearest neighbors (kNNs). More generally, we anticipate a monotone decreasing relationship between StatDn values and kNN distances. After initially evaluating the reproducibility of StatDns across replicate Hi-C data sets, we use this implied StatDn - kNN relationship to gauge the utility of StatDns for reconstruction validation, making recourse to both real and simulated examples. CONCLUSIONS:Our analyses demonstrate that, as constructed, StatDns do not provide a suitable measure for assessing the accuracy of 3D genome reconstructions. Whether this is attributable to specific choices surrounding normalization in defining StatDns or to the logic underlying their very formulation remains to be determined

eScholarship - University of California

Recommended from our members

The Rabl configuration limits topological entanglement of chromosomes in budding yeast.

Author: Arsuaga Javier
Burgess Sean
Cruz Brian
Pouokam Maxime
Segal Mark R
Vazquez Mariel
Publication venue: eScholarship, University of California
Publication date: 01/05/2019
Field of study

The three dimensional organization of genomes remains mostly unknown due to their high degree of condensation. Biophysical studies predict that condensation promotes the topological entanglement of chromatin fibers and the inhibition of function. How organisms balance between functionally active genomes and a high degree of condensation remains to be determined. Here we hypothesize that the Rabl configuration, characterized by the attachment of centromeres and telomeres to the nuclear envelope, helps to reduce the topological entanglement of chromosomes. To test this hypothesis we developed a novel method to quantify chromosome entanglement complexity in 3D reconstructions obtained from Chromosome Conformation Capture (CCC) data. Applying this method to published data of the yeast genome, we show that computational models implementing the attachment of telomeres or centromeres alone are not sufficient to obtain the reduced entanglement complexity observed in 3D reconstructions. It is only when the centromeres and telomeres are attached to the nuclear envelope (i.e. the Rabl configuration) that the complexity of entanglement of the genome is comparable to that of the 3D reconstructions. We therefore suggest that the Rabl configuration is an essential player in the simplification of the entanglement of chromatin fibers

eScholarship - University of California

Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

Author: Xia Kelin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/07/2017
Field of study

In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

FigShare

Reconstruction of 3D genome architecture via a two-stage algorithm

Author: Henrik L. Bengtsson
Mark R. Segal
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Data mining and machine learning methods for chromosome conformation data analysis

Author: Oluwadare Oluwatosin
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.Includes bibliographical reference

University of Missouri: MOspace

Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

Author: Alessandro Tanca (494538)
Antonio Palomba (494539)
Cristina Fraumene (374294)
Edoardo Fiorillo (518797)
Francesco Cucca (145742)
Marcello Abbondio (3706183)
Sergio Uzzau (186221)
Valeria Manghina (3498188)
Publication venue
Publication date: 22/03/2019
Field of study

Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

arXiv.org e-Print Archive

FigShare

The genome of the medieval Black Death agent (extended abstract)

Author: Chauve Cedric
Rajaraman Ashok
Tannier Eric
Publication venue
Publication date: 29/07/2013
Field of study

The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.Comment: Extended abstract of a talk presented at the conference JOBIM 2013, https://colloque.inra.fr/jobim2013_eng/. Full paper submitte

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Finding undetected protein associations in cell signaling by belief propagation

Author: A. Braunstein
A. Dagkessamanskaia
Aronova
Bashor
Bayati
Benayoun
Burkholder
C. Borgs
Dickson
Guldener
Huang
J. Chayes
J.- M. Francois
Jenness
King
Kuranda
Locasale
M. Bailly-Bechet
Miyake
Nickell
Pei
R. Zecchina
Roberts
Scott
Soufi
Spode
Thattai
Travers
Xenarios
Yosef
Zheng
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2011
Field of study

External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.Comment: 6 pages, 3 figures, 1 table, Supporting Informatio

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Hal-Diderot

PORTO Publications Open Repository TOrino