14,151 research outputs found
Origin of Scaling Behavior of Protein Packing Density: A Sequential Monte Carlo Study of Compact Long Chain Polymers
Single domain proteins are thought to be tightly packed. The introduction of
voids by mutations is often regarded as destabilizing. In this study we show
that packing density for single domain proteins decreases with chain length. We
find that the radius of gyration provides poor description of protein packing
but the alpha contact number we introduce here characterize proteins well. We
further demonstrate that protein-like scaling relationship between packing
density and chain length is observed in off-lattice self-avoiding walks. A key
problem in studying compact chain polymer is the attrition problem: It is
difficult to generate independent samples of compact long self-avoiding walks.
We develop an algorithm based on the framework of sequential Monte Carlo and
succeed in generating populations of compact long chain off-lattice polymers up
to length . Results based on analysis of these chain polymers suggest
that maintaining high packing density is only characteristic of short chain
proteins. We found that the scaling behavior of packing density with chain
length of proteins is a generic feature of random polymers satisfying loose
constraint in compactness. We conclude that proteins are not optimized by
evolution to eliminate packing voids.Comment: 9 pages, 10 figures. Accepted by J. Chem. Phy
Heuristic Refinement Method for the Derivation of Protein Solution Structures: Validation on Cytochrome B562
A method is described for determining the family of protein structures compatible with solution data obtained primarily from nuclear magnetic resonance (NMR) spectroscopy. Starting with all possible conformations, the method systematically excludes conformations until the remaining structures are only those compatible with the data. The apparent computational intractability of this approach is reduced by assembling the protein in pieces, by considering the protein at several levels of abstraction, by utilizing constraint satisfaction methods to consider only a few atoms at a time, and by utilizing artificial intelligence methods of heuristic control to decide which actions will exclude the most conformations. Example results are presented for simulated NMR data from the known crystal structure of cytochrome b562 (103 residues). For 10 sample backbones an average root-mean-square deviation from the crystal of 4.1 A was found for all alpha-carbon atoms and 2.8 A for helix alpha-carbons alone. The 10 backbones define the family of all structures compatible with the data and provide nearly correct starting structures for adjustment by any of the current structure determination methods
Protein folding tames chaos
Protein folding produces characteristic and functional three-dimensional
structures from unfolded polypeptides or disordered coils. The emergence of
extraordinary complexity in the protein folding process poses astonishing
challenges to theoretical modeling and computer simulations. The present work
introduces molecular nonlinear dynamics (MND), or molecular chaotic dynamics,
as a theoretical framework for describing and analyzing protein folding. We
unveil the existence of intrinsically low dimensional manifolds (ILDMs) in the
chaotic dynamics of folded proteins. Additionally, we reveal that the
transition from disordered to ordered conformations in protein folding
increases the transverse stability of the ILDM. Stated differently, protein
folding reduces the chaoticity of the nonlinear dynamical system, and a folded
protein has the best ability to tame chaos. Additionally, we bring to light the
connection between the ILDM stability and the thermodynamic stability, which
enables us to quantify the disorderliness and relative energies of folded,
misfolded and unfolded protein states. Finally, we exploit chaos for protein
flexibility analysis and develop a robust chaotic algorithm for the prediction
of Debye-Waller factors, or temperature factors, of protein structures
Euclidean distance geometry and applications
Euclidean distance geometry is the study of Euclidean geometry based on the
concept of distance. This is useful in several applications where the input
data consists of an incomplete set of distances, and the output is a set of
points in Euclidean space that realizes the given distances. We survey some of
the theory of Euclidean distance geometry and some of the most important
applications: molecular conformation, localization of sensor networks and
statics.Comment: 64 pages, 21 figure
Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors
An effective potential function is critical for protein structure prediction
and folding simulation. Simplified protein models such as those requiring only
or backbone atoms are attractive because they enable efficient
search of the conformational space. We show residue specific reduced discrete
state models can represent the backbone conformations of proteins with small
RMSD values. However, no potential functions exist that are designed for such
simplified protein models. In this study, we develop optimal potential
functions by combining contact interaction descriptors and local
sequence-structure descriptors. The form of the potential function is a
weighted linear sum of all descriptors, and the optimal weight coefficients are
obtained through optimization using both native and decoy structures. The
performance of the potential function in test of discriminating native protein
structures from decoys is evaluated using several benchmark decoy sets. Our
potential function requiring only backbone atoms or atoms have
comparable or better performance than several residue-based potential functions
that require additional coordinates of side chain centers or coordinates of all
side chain atoms. By reducing the residue alphabets down to size 5 for local
structure-sequence relationship, the performance of the potential function can
be further improved. Our results also suggest that local sequence-structure
correlation may play important role in reducing the entropic cost of protein
folding.Comment: 20 pages, 5 figures, 4 tables. In press, Protein
Protein Docking by the Underestimation of Free Energy Funnels in the Space of Encounter Complexes
Similarly to protein folding, the association of two proteins is driven
by a free energy funnel, determined by favorable interactions in some neighborhood of the
native state. We describe a docking method based on stochastic global minimization of
funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting
for flexibility of the interface side chains. The method, called semi-definite
programming-based underestimation (SDU), employs a general quadratic function to
underestimate a set of local energy minima and uses the resulting underestimator to bias
further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its
application to docking in the rotational and translational space SE(3) is not
straightforward due to the geometry of that space. We introduce a strategy that uses
separate independent variables for side-chain optimization, center-to-center distance of the
two proteins, and five angular descriptors of the relative orientations of the molecules.
The removal of the center-to-center distance turns out to vastly improve the efficiency of
the search, because the five-dimensional space now exhibits a well-behaved energy surface
suitable for underestimation. This algorithm explores the free energy surface spanned by
encounter complexes that correspond to local free energy minima and shows similarity to the
model of macromolecular association that proceeds through a series of collisions. Results
for standard protein docking benchmarks establish that in this space the free energy
landscape is a funnel in a reasonably broad neighborhood of the native state and that the
SDU strategy can generate docking predictions with less than 5 ïżœ ligand interface Ca
root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared
to Monte Carlo methods
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
LoopIng: A template-based tool for predicting the structure of protein loops
MOTIVATION:
Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.
RESULTS:
We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4-10 residues) and significant enhancements for long loops (11-20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1âmin/loop)
- âŠ