thesis

Statistical approaches to the study of protein folding and energetics

Abstract

The determination of protein structure and the exploration of protein folding landscapes are two of the key problems in computational biology. In order to address these challenges, both a protein model that accurately captures the physics of interest and an efficient sampling algorithm are required. The first part of this thesis documents the continued development of CRANKITE, a coarse-grained protein model, and its energy landscape exploration using nested sampling, a Bayesian sampling algorithm. We extend CRANKITE and optimize its parameters using a maximum likelihood approach. The efficiency of our procedure, using the contrastive divergence approximation, allows a large training set to be used, producing a model which is transferable to proteins not included in the training set. We develop an empirical Bayes model for the prediction of protein β-contacts, which are required inputs for CRANKITE. Our approach couples the constraints and prior knowledge associated with β-contacts to a maximum entropy-based statistic which predicts evolutionarily-related contacts. Nested sampling (NS) is a Bayesian algorithm shown to be efficient at sampling systems which exhibit a first-order phase transition. In this work we parallelize the algorithm and, for the first time, apply it to a biophysical system: small globular proteins modelled using CRANKITE. We generate energy landscape charts, which give a large-scale visualization of the protein folding landscape, and we compare the efficiency of NS to an alternative sampling technique, parallel tempering, when calculating the heat capacity of a short peptide. In the final part of the thesis we adapt the NS algorithm for use within a molecular dynamics framework and demonstrate the application of the algorithm by calculating the thermodynamics of allatom models of a small peptide, comparing results to the standard replica exchange approach. This adaptation will allow NS to be used with more realistic force fields in the future

    Similar works