Search CORE

36,585 research outputs found

Computational modeling to elucidate molecular mechanisms of epigenetic memory

Author: Alsing
Andrews
Ang
Angel
Arnold
Bannister
Barth
Beisel
Benecke
Bhattacharya
Binder
Black
Buscaino
Bushey
Canzio
Cowieson
David-Rus
Dayarian
Deal
Dodd
Fonseca
Freddolino
Gaffney
Grauffel
Greer
Grewal
Grigoryev
Hanna
Hathaway
Henikoff
Herranz
Hodges
Jacobs
Korolev
Korolev
Kouzarides
Ku
Kulaeva
Lin
Margreitter
Margueron
Mendenhall
Muller
Papp
Pasque
Peinado
Piana
Potoyan
Prohaska
Raghavan
Rohlf
Sanbonmatsu
Sanli
Schwab
Schwammle
Sedighi
So
Sontag
Steffen
Takahashi
Teif
Thon
Wocjan
Xu
Yang
Yuan
Zee
Zhang
Zhang
Publication venue
Publication date: 01/01/2015
Field of study

How do mammalian cells that share the same genome exist in notably distinct phenotypes, exhibiting differences in morphology, gene expression patterns, and epigenetic chromatin statuses? Furthermore how do cells of different phenotypes differentiate reproducibly from a single fertilized egg? These are fundamental problems in developmental biology. Epigenetic histone modifications play an important role in the maintenance of different cell phenotypes. The exact molecular mechanism for inheritance of the modification patterns over cell generations remains elusive. The complexity comes partly from the number of molecular species and the broad time scales involved. In recent years mathematical modeling has made significant contributions on elucidating the molecular mechanisms of DNA methylation and histone covalent modification inheritance. We will pedagogically introduce the typical procedure and some technical details of performing a mathematical modeling study, and discuss future developments.Comment: 36 pages, 4 figures, 2 tables, book chapte

arXiv.org e-Print Archive

Crossref

Hierarchical coexistence of universality and diversity controls robustness and multi-functionality in intermediate filament protein networks

Author: Markus J. Buehler
Theodor Ackbarow
Publication venue
Publication date: 25/08/2007
Field of study

Proteins constitute the elementary building blocks of a vast variety of biological materials such as cellular protein networks, spider silk or bone, where they create extremely robust, multi-functional materials by self-organization of structures over many length- and time scales, from nano to macro. Some of the structural features are commonly found in a many different tissues, that is, they are highly conserved. Examples of such universal building blocks include alpha-helices, beta-sheets or tropocollagen molecules. In contrast, other features are highly specific to tissue types, such as particular filament assemblies, beta-sheet nanocrystals in spider silk or tendon fascicles. These examples illustrate that the coexistence of universality and diversity – in the following referred to as the universality-diversity paradigm (UDP) – is an overarching feature in protein materials. This paradigm is a paradox: How can a structure be universal and diverse at the same time? In protein materials, the coexistence of universality and diversity is enabled by utilizing hierarchies, which serve as an additional dimension beyond the 3D or 4D physical space. This may be crucial to understand how their structure and properties are linked, and how these materials are capable of combining seemingly disparate properties such as strength and robustness. Here we illustrate how the UDP enables to unify universal building blocks and highly diversified patterns through formation of hierarchical structures that lead to multi-functional, robust yet highly adapted structures. We illustrate these concepts in an analysis of three types of intermediate filament proteins, including vimentin, lamin and keratin

Nature Precedings

TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

Author: Cang Zixuan
Wei Guo-Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/03/2017
Field of study

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

Author: Xia Kelin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/07/2017
Field of study

In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

FigShare

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

Author: Cang Zixuan
Mu Lin
Wei Guowei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/08/2017
Field of study

This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Author: Mahoney Michael W.
Publication venue
Publication date: 08/10/2010
Field of study

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

arXiv.org e-Print Archive

CiteSeerX