2,607 research outputs found

    Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized

    Get PDF
    Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge based potentials based on pairwise distances -- so-called "potentials of mean force" (PMFs) -- have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state -- a necessary component of these potentials -- is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities reference ratio distributions deriving from the application of the reference ratio method. This new view is not only of theoretical relevance, but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures

    Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

    Get PDF
    This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well

    Protein Structure Determination Using Chemical Shifts

    Full text link
    In this PhD thesis, a novel method to determine protein structures using chemical shifts is presented.Comment: Univ Copenhagen PhD thesis (2014) in Biochemistr

    Monte Carlo Protein Folding: Simulations of Met-Enkephalin with Solvent-Accessible Area Parameterizations

    Get PDF
    Treating realistically the ambient water is one of the main difficulties in applying Monte Carlo methods to protein folding. The solvent-accessible area method, a popular method for treating water implicitly, is investigated by means of Metropolis simulations of the brain peptide Met-Enkephalin. For the phenomenological energy function ECEPP/2 nine atomic solvation parameter (ASP) sets are studied that had been proposed by previous authors. The simulations are compared with each other, with simulations with a distance dependent electrostatic permittivity ϵ(r)\epsilon (r), and with vacuum simulations (ϵ=2\epsilon =2). Parallel tempering and a recently proposed biased Metropolis technique are employed and their performances are evaluated. The measured observables include energy and dihedral probability densities (pds), integrated autocorrelation times, and acceptance rates. Two of the ASP sets turn out to be unsuitable for these simulations. For all other sets, selected configurations are minimized in search of the global energy minima. Unique minima are found for the vacuum and the ϵ(r)\epsilon(r) system, but for none of the ASP models. Other observables show a remarkable dependence on the ASPs. In particular, autocorrelation times vary dramatically with the ASP parameters. Three ASP sets have much smaller autocorrelations at 300 K than the vacuum simulations, opening the possibility that simulations can be speeded up vastly by judiciously chosing details of the forceComment: 10 pages; published in "NIC Symposium 2004", eds. D. Wolf at el. (NIC, Juelich, 2004

    Skewed Factor Models Using Selection Mechanisms

    Get PDF
    Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-t, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset
    corecore