14,954 research outputs found
Structurally constrained protein evolution: results from a lattice simulation
We simulate the evolution of a protein-like sequence subject to point
mutations, imposing conservation of the ground state, thermodynamic stability
and fast folding. Our model is aimed at describing neutral evolution of natural
proteins. We use a cubic lattice model of the protein structure and test the
neutrality conditions by extensive Monte Carlo simulations. We observe that
sequence space is traversed by neutral networks, i.e. sets of sequences with
the same fold connected by point mutations. Typical pairs of sequences on a
neutral network are nearly as different as randomly chosen sequences. The
fraction of neutral neighbors has strong sequence to sequence variations, which
influence the rate of neutral evolution. In this paper we study the
thermodynamic stability of different protein sequences. We relate the high
variability of the fraction of neutral mutations to the complex energy
landscape within a neutral network, arguing that valleys in this landscape are
associated to high values of the neutral mutation rate. We find that when a
point mutation produces a sequence with a new ground state, this is likely to
have a low stability. Thus we tentatively conjecture that neutral networks of
different structures are typically well separated in sequence space. This
results indicates that changing significantly a protein structure through a
biologically acceptable chain of point mutations is a rare, although possible,
event.Comment: added reference, to appear on European Physical Journal
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
Spatially proximate amino acids in a protein tend to coevolve. A protein's
three-dimensional (3D) structure hence leaves an echo of correlations in the
evolutionary record. Reverse engineering 3D structures from such correlations
is an open problem in structural biology, pursued with increasing vigor as more
and more protein sequences continue to fill the data banks. Within this task
lies a statistical inference problem, rooted in the following: correlation
between two sites in a protein sequence can arise from firsthand interaction
but can also be network-propagated via intermediate sites; observed correlation
is not enough to guarantee proximity. To separate direct from indirect
interactions is an instance of the general problem of inverse statistical
mechanics, where the task is to learn model parameters (fields, couplings) from
observables (magnetizations, correlations, samples) in large systems. In the
context of protein sequences, the approach has been referred to as
direct-coupling analysis. Here we show that the pseudolikelihood method,
applied to 21-state Potts models describing the statistical properties of
families of evolutionarily related proteins, significantly outperforms existing
approaches to the direct-coupling analysis, the latter being based on standard
mean-field techniques. This improved performance also relies on a modified
score for the coupling strength. The results are verified using known crystal
structures of specific sequence instances of various protein families. Code
implementing the new method can be found at http://plmdca.csc.kth.se/.Comment: 19 pages, 16 figures, published versio
Evolutionary decision rules for predicting protein contact maps
Protein structure prediction is currently one of
the main open challenges in Bioinformatics. The protein
contact map is an useful, and commonly used, represen tation for protein 3D structure and represents binary
proximities (contact or non-contact) between each pair of
amino acids of a protein. In this work, we propose a multi objective evolutionary approach for contact map prediction
based on physico-chemical properties of amino acids. The
evolutionary algorithm produces a set of decision rules that
identifies contacts between amino acids. The rules obtained
by the algorithm impose a set of conditions based on amino
acid properties to predict contacts. We present results
obtained by our approach on four different protein data
sets. A statistical study was also performed to extract valid
conclusions from the set of prediction rules generated by
our algorithm. Results obtained confirm the validity of our
proposal
- …