36,585 research outputs found
Computational modeling to elucidate molecular mechanisms of epigenetic memory
How do mammalian cells that share the same genome exist in notably distinct
phenotypes, exhibiting differences in morphology, gene expression patterns, and
epigenetic chromatin statuses? Furthermore how do cells of different phenotypes
differentiate reproducibly from a single fertilized egg? These are fundamental
problems in developmental biology. Epigenetic histone modifications play an
important role in the maintenance of different cell phenotypes. The exact
molecular mechanism for inheritance of the modification patterns over cell
generations remains elusive. The complexity comes partly from the number of
molecular species and the broad time scales involved. In recent years
mathematical modeling has made significant contributions on elucidating the
molecular mechanisms of DNA methylation and histone covalent modification
inheritance. We will pedagogically introduce the typical procedure and some
technical details of performing a mathematical modeling study, and discuss
future developments.Comment: 36 pages, 4 figures, 2 tables, book chapte
Hierarchical coexistence of universality and diversity controls robustness and multi-functionality in intermediate filament protein networks
Proteins constitute the elementary building blocks of a vast variety of biological materials such as cellular protein networks, spider silk or bone, where they create extremely robust, multi-functional materials by self-organization of structures over many length- and time scales, from nano to macro. Some of the structural features are commonly found in a many different tissues, that is, they are highly conserved. Examples of such universal building blocks include alpha-helices, beta-sheets or tropocollagen molecules. In contrast, other features are highly specific to tissue types, such as particular filament assemblies, beta-sheet nanocrystals in spider silk or tendon fascicles. These examples illustrate that the coexistence of universality and diversity – in the following referred to as the universality-diversity paradigm (UDP) – is an overarching feature in protein materials. This paradigm is a paradox: How can a structure be universal and diverse at the same time? In protein materials, the coexistence of universality and diversity is enabled by utilizing hierarchies, which serve as an additional dimension beyond the 3D or 4D physical space. This may be crucial to understand how their structure and properties are linked, and how these materials are capable of combining seemingly disparate properties such as strength and robustness. Here we illustrate how the UDP enables to unify universal building blocks and highly diversified patterns through formation of hierarchical structures that lead to multi-functional, robust yet highly adapted structures. We illustrate these concepts in an analysis of three types of intermediate filament proteins, including vimentin, lamin and keratin
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …