10 research outputs found
Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized
Understanding protein structure is of crucial importance in science, medicine
and biotechnology. For about two decades, knowledge based potentials based on
pairwise distances -- so-called "potentials of mean force" (PMFs) -- have been
center stage in the prediction and design of protein structure and the
simulation of protein folding. However, the validity, scope and limitations of
these potentials are still vigorously debated and disputed, and the optimal
choice of the reference state -- a necessary component of these potentials --
is an unsolved problem. PMFs are loosely justified by analogy to the reversible
work theorem in statistical physics, or by a statistical argument based on a
likelihood function. Both justifications are insightful but leave many
questions unanswered. Here, we show for the first time that PMFs can be seen as
approximations to quantities that do have a rigorous probabilistic
justification: they naturally arise when probability distributions over
different features of proteins need to be combined. We call these quantities
reference ratio distributions deriving from the application of the reference
ratio method. This new view is not only of theoretical relevance, but leads to
many insights that are of direct practical use: the reference state is uniquely
defined and does not require external physical insights; the approach can be
generalized beyond pairwise distances to arbitrary features of protein
structure; and it becomes clear for which purposes the use of these quantities
is justified. We illustrate these insights with two applications, involving the
radius of gyration and hydrogen bonding. In the latter case, we also show how
the reference ratio method can be iteratively applied to sculpt an energy
funnel. Our results considerably increase the understanding and scope of energy
functions derived from known biomolecular structures
Deciphering the Preference and Predicting the Viability of Circular Permutations in Proteins
Circular permutation (CP) refers to situations in which the termini of a protein are relocated to other positions in the structure. CP occurs naturally and has been artificially created to study protein function, stability and folding. Recently CP is increasingly applied to engineer enzyme structure and function, and to create bifunctional fusion proteins unachievable by tandem fusion. CP is a complicated and expensive technique. An intrinsic difficulty in its application lies in the fact that not every position in a protein is amenable for creating a viable permutant. To examine the preferences of CP and develop CP viability prediction methods, we carried out comprehensive analyses of the sequence, structural, and dynamical properties of known CP sites using a variety of statistics and simulation methods, such as the bootstrap aggregating, permutation test and molecular dynamics simulations. CP particularly favors Gly, Pro, Asp and Asn. Positions preferred by CP lie within coils, loops, turns, and at residues that are exposed to solvent, weakly hydrogen-bonded, environmentally unpacked, or flexible. Disfavored positions include Cys, bulky hydrophobic residues, and residues located within helices or near the protein's core. These results fostered the development of an effective viable CP site prediction system, which combined four machine learning methods, e.g., artificial neural networks, the support vector machine, a random forest, and a hierarchical feature integration procedure developed in this work. As assessed by using the hydrofolate reductase dataset as the independent evaluation dataset, this prediction system achieved an AUC of 0.9. Large-scale predictions have been performed for nine thousand representative protein structures; several new potential applications of CP were thus identified. Many unreported preferences of CP are revealed in this study. The developed system is the best CP viability prediction method currently available. This work will facilitate the application of CP in research and biotechnology
The MULTICOM toolbox for protein structure prediction
<p>Abstract</p> <p>Background</p> <p>As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources.</p> <p>Results</p> <p>To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction.</p> <p>Conclusions</p> <p>These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at <url>http://sysbio.rnet.missouri.edu/multicom_toolbox/</url>.</p