447 research outputs found

    Protein Design by Mining and Sampling an Undirected Graphical Model of Evolutionary Constraints

    Get PDF
    Evolutionary pressures on proteins to maintain structure and function have constrained their sequences over time and across species. The sequence record thus contains valuable information regarding the acceptable variation and covariation of amino acids in members of a protein family. When designing new members of a protein family, with an eye toward modified or improved stability or functionality, it is incumbent upon a protein engineer to uncover such constraints and design conforming sequences. This paper develops such an approach for protein design: we first mine an undirected probabilistic graphical model of a given protein family, and then use the model generatively to sample new sequences. While sampling from an undirected model is difficult in general, we present two complementary algorithms that effectively sample the sequence space constrained by our protein family model. One algorithm focuses on the high-likelihood regions of the space. Sequences are generated by sampling the cliques in a graphical model according to their likelihood while maintaining neighborhood consistency. The other algorithm designs a fixed number of high-likelihood sequences that are reflective of the amino acid composition of the given family. A set of shuffled sequences is iteratively improved so as to increase their mean likelihood under the model. Tests for two important protein families, WW domains and PDZ domains, show that both sampling methods converge quickly and generate diverse high-quality sets of sequences for further biological study

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    MATEDA: A suite of EDA programs in Matlab

    Get PDF
    This paper describes MATEDA-2.0, a suite of programs in Matlab for estimation of distribution algorithms. The package allows the optimization of single and multi-objective problems with estimation of distribution algorithms (EDAs) based on undirected graphical models and Bayesian networks. The implementation is conceived for allowing the incorporation by the user of different combinations of selection, learning, sampling, and local search procedures. Other included methods allow the analysis of the structures learned by the probabilistic models, the visualization of particular features of these structures and the use of the probabilistic models as fitness modeling tools

    Optimization methods for side-chain positioning and macromolecular docking

    Full text link
    This dissertation proposes new optimization algorithms targeting protein-protein docking which is an important class of problems in computational structural biology. The ultimate goal of docking methods is to predict the 3-dimensional structure of a stable protein-protein complex. We study two specific problems encountered in predictive docking of proteins. The first problem is Side-Chain Positioning (SCP), a central component of homology modeling and computational protein docking methods. We formulate SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. Our formulation also considers the significant special structure of proteins that SCP exhibits for docking. We develop an approximate algorithm that solves a relaxation of MWIS and employ randomized estimation heuristics to obtain high-quality feasible solutions to the problem. The algorithm is fully distributed and can be implemented on multi-processor architectures. Our computational results on a benchmark set of protein complexes show that the accuracy of our approximate MWIS-based algorithm predictions is comparable with the results achieved by a state-of-the-art method that finds an exact solution to SCP. The second problem we target in this work is protein docking refinement. We propose two different methods to solve the refinement problem. The first approach is based on a Monte Carlo-Minimization (MCM) search to optimize rigid-body and side-chain conformations for binding. In particular, we study the impact of optimally positioning the side-chains in the interface region between two proteins in the process of binding. We report computational results showing that incorporating side-chain flexibility in docking provides substantial improvement in the quality of docked predictions compared to the rigid-body approaches. Further, we demonstrate that the inclusion of unbound side-chain conformers in the side-chain search introduces significant improvement in the performance of the docking refinement protocols. In the second approach, we propose a novel stochastic optimization algorithm based on Subspace Semi-Definite programming-based Underestimation (SSDU), which aims to solve protein docking and protein structure prediction. SSDU is based on underestimating the binding energy function in a permissive subspace of the space of rigid-body motions. We apply Principal Component Analysis (PCA) to determine the permissive subspace and reduce the dimensionality of the conformational search space. We consider the general class of convex polynomial underestimators, and formulate the problem of finding such underestimators as a Semi-Definite Programming (SDP) problem. Using these underestimators, we perform a biased sampling in the vicinity of the conformational regions where the energy function is at its global minimum. Moreover, we develop an exploration procedure based on density-based clustering to detect the near-native regions even when there are many local minima residing far from each other. We also incorporate a Model Selection procedure into SSDU to pick a predictive conformation. Testing our algorithm over a benchmark of protein complexes indicates that SSDU substantially improves the quality of docking refinement compared with existing methods
    • …
    corecore