Potentials of Mean Force for Protein Structure Prediction Vindicated,
  Formalized and Generalized

A Ben-Naim; A Colubri; A Finkelstein; A Shmygelska; B Bernard; B Fain; B Hesselbo; C Kirtay; C Zhang; CB Anfinsen; Christian Andreetta; D Chandler; D Chandler; D Eramian; D Gilis; D Gilis; D McQuarrie; D Reith; D Rykunov; D Rykunov; D Shortle; Darren R. Flower; DT Jones; F Zhao; GR Bowman; H Gohlke; H Zhou; I Muegge; J Bowie; J Bryngelson; J Cheng; J Ferkinghoff-Borg; J Kocher; J Moult; J Pearl; Jes Frellsen; Jesper Ferkinghoff-Borg; Jonas Paulsen; K Dill; KT Simons; KT Simons; M Borg; M Rooman; M Rooman; M Sippl; Martin Paluszewski; Mikael Borg; MJ Sippl; MJ Sippl; MJ Sippl; MY Shen; NV Buchete; P Leopold; P Májek; P Thomas; PD Thomas; R Liithy; R Samudrala; S Huang; S Kullback; S Miyazawa; S Miyazawa; S Tanaka; Sandro Bottaro; T Hamelryck; T Hamelryck; T Lazaridis; Thomas Hamelryck; W Boomsma; W Gilks; WA Koppensteiner; WL Delano; Wouter Boomsma; Y Su

research

Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized

Abstract

Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge based potentials based on pairwise distances -- so-called "potentials of mean force" (PMFs) -- have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state -- a necessary component of these potentials -- is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities reference ratio distributions deriving from the application of the reference ratio method. This new view is not only of theoretical relevance, but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures