235 research outputs found

    Assessing the structure of proteins and protein complexes through physical and statistical approaches

    Get PDF
    Determining the correct state of a protein or a protein complex is of paramount importance for current medical and pharmaceutical research. The stable conformation of such systems depend on two processes called protein folding and protein-protein interaction. In the course of the last 50 years, both processes have been fruitfully studied. Yet, a complete understanding is still not reached, and the accuracy and the efficiency of the approaches for studying these problems is not yet optimal. This thesis is devoted to devising physical and statistical methods for recognizing the native state of a protein or a protein complex. The studies will be mostly based on BACH, a knowledge-based potential originally designed for the discrimination of native structures in protein folding problems. BACH method will be analyzed and extended: first, a new method to account for protein-solvent interaction will be presented. Then, we will describe an extension of BACH aimed at assessing the quality of protein complexes in protein-protein interaction problems. Finally, we will present a procedure aimed at predicting the structure of a complex based on a hierarchy of approaches ranging from rigid docking up to molecular dynamics in explicit solvent. The reliability of the approaches we propose will be always benchmarked against a selection of other state-of-the-art scoring functions which obtained good results in CASP and CAPRI competitions

    Modeling and simulation of intrinsically disordered proteins

    Get PDF
    This work is primarily about the development, validation and application of computer simulation models for intrinsically disordered proteins, both in solution and in the presence of uniformly charged, ideal surfaces. The models in question are either coarse-grained or atomistic in nature, and their applications are dependent on the specific purpose of each study. Both, Metropolis Monte Carlo and molecular dynamics simulations were employed to execute them.In regard to the coarse-grained models, it was found that a simple physical model can be used to mimic the properties of flexible proteins, helping to understand how and why these proteins adsorb to surfaces under certain conditions. The same model later shown that two disordered proteins from different sources (saliva and milk) possess similar structural and thermodynamic properties in solution and when adsorbed to surfaces, thus being hypothesized that it may be possible to use one of them as a substitute for the other under a pharmaceutical context.After a first indication that the atomistic models used until recently for the simulation of well-folded proteins may not be applicable to their disordered counterparts, it was then confirmed - by evaluating several such models against experimental evidence - that these models do indeed produce overly collapsed IDP conformational ensembles. New models, favoring protein–water over protein–protein interactions, were then shown to effectively produce more extended conformations, which are in much better agreement with each other and with experimental evidence. One of the new atomistic models was then used to perform the structural characterization of a disordered peptide conjugated to a small molecule, which has been shown to possess promising therapeutical applications. The value of computer simulations is well illustrated in this study, as the insight obtainable from experiment was limited and it was only through the analysis of the simulations that a possible link between the average conjugate structure and its increased antifungal activity is established

    PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models

    Get PDF
    Markov (state) models (MSMs) and related models of molecular kinetics have recently received a surge of interest as they can systematically reconcile simulation data from either a few long or many short simulations and allow us to analyze the essential metastable structures, thermodynamics, and kinetics of the molecular system under investigation. However, the estimation, validation, and analysis of such models is far from trivial and involves sophisticated and often numerically sensitive methods. In this work we present the opensource Python package PyEMMA (http://pyemma.org) that provides accurate and efficient algorithms for kinetic model construction. PyEMMA can read all common molecular dynamics data formats, helps in the selection of input features, provides easy access to dimension reduction algorithms such as principal component analysis (PCA) and time-lagged independent component analysis (TICA) and clustering algorithms such as k-means, and contains estimators for MSMs, hidden Markov models, and several other models. Systematic model validation and error calculation methods are provided. PyEMMA offers a wealth of analysis functions such that the user can conveniently compute molecular observables of interest. We have derived a systematic and accurate way to coarse-grain MSMs to few states and to illustrate the structures of the metastable states of the system. Plotting functions to produce a manuscript-ready presentation of the results are available. In this work, we demonstrate the features of the software and show new methodological concepts and results produced by PyEMMA

    Using evolutionary covariance to infer protein sequence-structure relationships

    Get PDF
    During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection. The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures. The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins
    • …
    corecore