4,067 research outputs found
Euclidean distance geometry and applications
Euclidean distance geometry is the study of Euclidean geometry based on the
concept of distance. This is useful in several applications where the input
data consists of an incomplete set of distances, and the output is a set of
points in Euclidean space that realizes the given distances. We survey some of
the theory of Euclidean distance geometry and some of the most important
applications: molecular conformation, localization of sensor networks and
statics.Comment: 64 pages, 21 figure
Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the
biomolecular data analysis. With the combination of spectral graph method, I
reveal the essential difference between the global scale models and local scale
ones in structure clustering, i.e., different optimization on Euclidean (or
spatial) distances and sequential (or genomic) distances. More specifically,
clusters from global scale models optimize Euclidean distance relations. Local
scale models, on the other hand, result in clusters that optimize the genomic
distance relations. For a biomolecular data, Euclidean distances and sequential
distances are two independent variables, which can never be optimized
simultaneously in data clustering. However, sequence scale in my SeqMM can work
as a tuning parameter that balances these two variables and deliver different
clusterings based on my purposes. Further, my SeqMM is used to explore the
hierarchical structures of chromosomes. I find that in global scale, the
Fiedler vector from my SeqMM bears a great similarity with the principal vector
from principal component analysis, and can be used to study genomic
compartments. In TAD analysis, I find that TADs evaluated from different scales
are not consistent and vary a lot. Particularly when the sequence scale is
small, the calculated TAD boundaries are dramatically different. Even for
regions with high contact frequencies, TAD regions show no obvious consistence.
However, when the scale value increases further, although TADs are still quite
different, TAD boundaries in these high contact frequency regions become more
and more consistent. Finally, I find that for a fixed local scale, my method
can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE
Kinetic distance and kinetic maps from molecular dynamics simulation
Characterizing macromolecular kinetics from molecular dynamics (MD)
simulations requires a distance metric that can distinguish
slowly-interconverting states. Here we build upon diffusion map theory and
define a kinetic distance for irreducible Markov processes that quantifies how
slowly molecular conformations interconvert. The kinetic distance can be
computed given a model that approximates the eigenvalues and eigenvectors
(reaction coordinates) of the MD Markov operator. Here we employ the
time-lagged independent component analysis (TICA). The TICA components can be
scaled to provide a kinetic map in which the Euclidean distance corresponds to
the kinetic distance. As a result, the question of how many TICA dimensions
should be kept in a dimensionality reduction approach becomes obsolete, and one
parameter less needs to be specified in the kinetic model construction. We
demonstrate the approach using TICA and Markov state model (MSM) analyses for
illustrative models, protein conformation dynamics in bovine pancreatic trypsin
inhibitor and protein-inhibitor association in trypsin and benzamidine
Quantum mechanical calculation of the effects of stiff and rigid constraints in the conformational equilibrium of the Alanine dipeptide
If constraints are imposed on a macromolecule, two inequivalent classical
models may be used: the stiff and the rigid one. This work studies the effects
of such constraints on the Conformational Equilibrium Distribution (CED) of the
model dipeptide HCO-L-Ala-NH2 without any simplifying assumption. We use ab
initio Quantum Mechanics calculations including electron correlation at the MP2
level to describe the system, and we measure the conformational dependence of
all the correcting terms to the naive CED based in the Potential Energy Surface
(PES) that appear when the constraints are considered. These terms are related
to mass-metric tensors determinants and also occur in the Fixman's compensating
potential. We show that some of the corrections are non-negligible if one is
interested in the whole Ramachandran space. On the other hand, if only the
energetically lower region, containing the principal secondary structure
elements, is assumed to be relevant, then, all correcting terms may be
neglected up to peptides of considerable length. This is the first time, as far
as we know, that the analysis of the conformational dependence of these
correcting terms is performed in a relevant biomolecule with a realistic
potential energy function.Comment: 37 pages, 4 figures, LaTeX, BibTeX, AMSTe
Encounter complexes and dimensionality reduction in protein-protein association
An outstanding challenge has been to understand the mechanism whereby proteins associate. We report here the results of exhaustively sampling the conformational space in proteinâprotein association using a physics-based energy function. The agreement between experimental intermolecular paramagnetic relaxation enhancement (PRE) data and the PRE profiles calculated from the docked structures shows that the method captures both specific and non-specific encounter complexes. To explore the energy landscape in the vicinity of the native structure, the nonlinear manifold describing the relative orientation of two solid bodies is projected onto a Euclidean space in which the shape of low energy regions is studied by principal component analysis. Results show that the energy surface is canyon-like, with a smooth funnel within a two dimensional subspace capturing over 75% of the total motion. Thus, proteins tend to associate along preferred pathways, similar to sliding of a protein along DNA in the process of protein-DNA recognition
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Protein Docking by the Underestimation of Free Energy Funnels in the Space of Encounter Complexes
Similarly to protein folding, the association of two proteins is driven
by a free energy funnel, determined by favorable interactions in some neighborhood of the
native state. We describe a docking method based on stochastic global minimization of
funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting
for flexibility of the interface side chains. The method, called semi-definite
programming-based underestimation (SDU), employs a general quadratic function to
underestimate a set of local energy minima and uses the resulting underestimator to bias
further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its
application to docking in the rotational and translational space SE(3) is not
straightforward due to the geometry of that space. We introduce a strategy that uses
separate independent variables for side-chain optimization, center-to-center distance of the
two proteins, and five angular descriptors of the relative orientations of the molecules.
The removal of the center-to-center distance turns out to vastly improve the efficiency of
the search, because the five-dimensional space now exhibits a well-behaved energy surface
suitable for underestimation. This algorithm explores the free energy surface spanned by
encounter complexes that correspond to local free energy minima and shows similarity to the
model of macromolecular association that proceeds through a series of collisions. Results
for standard protein docking benchmarks establish that in this space the free energy
landscape is a funnel in a reasonably broad neighborhood of the native state and that the
SDU strategy can generate docking predictions with less than 5 ïżœ ligand interface Ca
root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared
to Monte Carlo methods
- âŠ