75 research outputs found

    New statistical potentials for improved protein structure prediction

    Get PDF
    This dissertation presents a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. These new four-body contact potentials, noted as SET1 four-body contact potentials (sequential information included), show important gains in threading. SET2 four-body contact potentials (non-sequential information included) have also been developed to supplement SET1 by including spatial information. In addition to SET1 and SET2, we also include the short-range conformational energies introduced by us previously in threading. The combination of these different potentials shows significant improvement in threading tests of some decoy sets. Protein packing is an important aspect of computational structural biology. Icosahedron is chosen as an ideal model to fit the protein packing clusters from a set of protein structures. A theoretical description of packing patterns and packing regularities of icosahedron has been proposed. We find that the order parameter (orientation function) measuring the angular overlap of directions in coordination clusters with directions of the icosahedron is 0.91, which is a significant improvement in comparison with the value 0.82 for the order parameter with the face-centered cubic (fcc) lattice. Close packing tendencies and patterns of residue packing in proteins is considered in detail and a theoretical description of these packing regularities is proposed. Protein motion is another important field. The elastic network interpolation (ENI) model has been used to generate conformational transition intermediates of adenylate kinase (AK) based only C alpha atoms. We construct the atomistic intermediates by grafting all the other atoms except C alpha from the open form AK and then performing CHARMM energy minimization to remove steric conflicts and optimize the intermediate structures. We compare the free energy profiles for all intermediates from both CHARMM force field and statistical energy functions. And we find CHARMM total free energies can successfully captures the two energy minima representing the open form AK and the closed form AK, however the free energies from statistical energy functions can detect the energy minimum representing the semi-closed intermediate with LID domain closed and NMP domain open and the local energy minimum representing the closed form AK

    Knowledge-based approaches for understanding structure-dynamics-function relationship in proteins

    Get PDF
    Proteins accomplish their functions through conformational changes, often brought about by changes in environmental conditions or ligand binding. Predicting the functional mechanisms of proteins is impossible without a deeper understanding of conformational transitions. Dynamics is the key link between the structure and function of proteins. The protein data bank (PDB) contains multiple structures of the same protein, which have been solved under different conditions, using different experimental methods or in complexes with different ligands. These alternate conformations of the same protein (or similar proteins) can provide important information about what conformational changes take place and how they are brought about. Though there have been multiple computational approaches developed to predict dynamics from structure information, little work has been done to exploit this apparent, but potentially informative, redundancy in the PDB. In this work I bridge this gap by exploring various knowledge-based approaches to understand the structure-dynamics relationship and how it translates into protein function. First, a novel method for constructing free energy landscapes for conformational changes in proteins is proposed by combining principal motions with knowledge-based potential energies and entropies from coarse-grained models of protein dynamics. Second, an innovative method for computing knowledge-based entropies for proteins using an inverse Boltzmann approach is introduced, similar to the manner in which statistical potentials were previously extracted. We hypothesize that amino acid contact changes observed in the course of conformational changes within a large set of proteins can provide information about local pairwise flexibilities or entropies. By combining this new entropy measure with knowledge-based potential functions, we formulate a knowledge-based free energy (KBF) function that we demonstrate outperforms other statistical potentials in its ability to identify native protein structures embedded with sets of decoys. Third, I apply the methods developed above in collaboration with experimentalists to understand the molecular mechanisms of conformational changes in several protein systems including cadherins and membrane transporters. This work introduces several ways that the huge data in the PDB can be utilized to understand the underlying principles behind the structure-dynamics-function relationships of proteins. Results from this work have several important applications in structural bioinformatics such as structure prediction, molecular docking, protein engineering and design. In particular, the new KBFs developed in this dissertation have immediate applications in emerging topics such as prediction of 3D structure from coevolving residues in sequence alignments as well as in identifying the phenotypic effects of mutants

    Statistical physics of T cell receptor development and antigen specificity

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 147-158).Higher organisms, such as humans, have an adaptive immune system that usually enables them to successfully combat diverse (and evolving) microbial pathogens. The adaptive immune system is not preprogrammed to respond to prescribed pathogens, yet it mounts pathogen-specific responses against diverse microbes, and establishes memory of past infections (the basis of vaccination). Although major advances have been made in understanding pertinent molecular and cellular phenomena, the mechanistic principles that govern many aspects of an immune response are not known. In this thesis, I illustrate how complementary approaches from the physical and life sciences can help confront this challenge. Specifically, I describe work that brings together statistical mechanics and cell biology to shed light on how key regulators of the adaptive immune system, T cells, are selected to enable pathogen-specific responses. A model of T cell development is introduced and analyzed (computationally and analytically) by employing methods from statistical physics, such as extreme value distributions and Hamiltonian minimization. Results show that selected T cell receptors are enriched in weakly interacting amino acids. Such T cell receptors recognize (i.e. bind sufficiently strongly to) pathogens through several contacts of moderate strength, each of which makes a significant contribution to overall binding. Disrupting any contact by mutating the pathogen is statistically likely to abrogate T cell recognition of the mutated pathogen. We propose that this is the mechanism for the specificity of T cells for unknown pathogens. The T cell development model is also used to discuss one way in which host genetics can influence the selection of T cells and concomitantly the control of HIV infection. A model of the T cell selection process as diffusion in a random field of immobile traps that intermittently turn "on" and "off" is developed to estimate the escape probability of dangerous T cells that could cause autoimmune disease. Finally, and importantly, throughout this thesis, I describe, how the theoretical studies are closely synergistic/complementary with biological experiments and human clinical data.by Andrej Košmrlj.Ph.D

    Using evolutionary covariance to infer protein sequence-structure relationships

    Get PDF
    During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection. The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures. The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins

    Antisense Peptide Technology for Diagnostic Tests and Bioengineering Research

    Get PDF
    Antisense peptide technology (APT) is based on a useful heuristic algorithm for rational peptide design. It was deduced from empirical observations that peptides consisting of complementary (sense and antisense) amino acids interact with higher probability and affinity than the randomly selected ones. This phenomenon is closely related to the structure of the standard genetic code table, and at the same time, is unrelated to the direction of its codon sequence translation. The concept of complementary peptide interaction is discussed, and its possible applications to diagnostic tests and bioengineering research are summarized. Problems and difficulties that may arise using APT are discussed, and possible solutions are proposed. The methodology was tested on the example of SARS-CoV-2. It is shown that the CABS-dock server accurately predicts the binding of antisense peptides to the SARS-CoV-2 receptor binding domain without requiring predefinition of the binding site. It is concluded that the benefits of APT outweigh the costs of random peptide screening and could lead to considerable savings in time and resources, especially if combined with other computational and immunochemical methods

    Combinatorial Fusion Rules to Describe Codon Assignment in the Standard Genetic Code

    Get PDF
    We propose combinatorial fusion rules that describe the codon assignment in the standard genetic code simply and uniformly for all canonical amino acids. These rules become obvious if the origin of the standard genetic code is considered as a result of a fusion of four protocodes: Two dominant AU and GC protocodes and two recessive AU and GC protocodes. The biochemical meaning of the fusion rules consists of retaining the complementarity between cognate codons of the small hydrophobic amino acids and large charged or polar amino acids within the protocodes. The proto tRNAs were assembled in form of two kissing hairpins with 9-base and 10-base loops in the case of dominant protocodes and two 9-base loops in the case of recessive protocodes. The fusion rules reveal the connection between the stop codons, the non-canonical amino acids, pyrrolysine and selenocysteine, and deviations in the translation of mitochondria. Using fusion rules, we predicted the existence of additional amino acids that are essential for the development of the standard genetic code. The validity of the proposed partition of the genetic code into dominant and recessive protocodes is considered referring to state-of-the-art hypotheses. The formation of two aminoacyl-tRNA synthetase classes is compatible with four-protocode partition

    Thioflavin T triggers \u3b2 amyloid peptide (1-40) fibrils formation.

    Get PDF
    Introduction A general characteristic of aggregation is the multiple interaction and cross-feedback among distinct mechanisms occurring at different hierarchical levels. The comprehension of the different species interconversion during aggregation is very important since emerging evidences indicate intermediate oligomeric aggregates as primary toxic species. In this context, A\u3b2 amyloid peptide provides a challenging model for studying aggregation phenomena both for the complexity of its association process and for the direct implications in Alzheimer\u2019s Disease. Aggregates growth conditions strongly affect the final morphology, the fibrillar molecular structure as well as the aggregation pathway which is characterized by the occurrence of multiple transient species. Methods The fluorescent dye Thioflavin T (ThT) is widely used to detect amyloid deposits and it is often used in situ to study aggregation kinetics, under the hypothesis that its presence does not affect the aggregation processes under study. Here we present an experimental study on A\u3b2(1-40) peptide fibrillation kinetics at pH 7.4. In the observed conditions, A\u3b2(1- 40) undergoes aggregation only if Thioflavin T is present in solution. This phenomenon was analyzed as a function of temperature, ThT and peptide concentrations in order to explore the underlying fibrillation mechanism. Light scattering, ThT fluorescence emission, two photon excitation fluorescence microscopy, were used in a kinetic fashion to highlight different sides and critical phases of the aggregation pathway. Circular Dichroism and FTIR measurements are used to characterize secondary structure of the aggregates. Results The selected approach gives detailed information on the time evolution of A\u3b2(1-40) fibrillation process highlighting structural changes at molecular level, different aggregate species growth and their morphologies. Our data show that A\u3b2(1-40) fibrillation process occurs only in the presence of ThT and that the observed aggregation involves at least three different aggregation mechanisms acting in competition. In the first step, small oligomers, which bind ThT, are formed via non nucleated polymerization mechanism and represent an activated state for following fibrils growth. This process appear to be a rate limiting step for two distinct fibril nucleation mechanisms probably affected by an high degree of spatial heterogeneity. Conclusions We demonstrated that in the selected experimental conditions ThT triggers the A\u3b2(1 1240) fibrillation process (D\u2019Amico et. al 2012). Sterical and chemical properties of ThT molecule may modulate the peptide conformation, with similar mechanisms to the ones that usually drive the binding of this dye to already formed amyloids. So, the presence of ThT in solution may change the thermodynamic equilibrium trapping specificmore ordered conformations prone to supramolecular assembly

    Assessing the structure of proteins and protein complexes through physical and statistical approaches

    Get PDF
    Determining the correct state of a protein or a protein complex is of paramount importance for current medical and pharmaceutical research. The stable conformation of such systems depend on two processes called protein folding and protein-protein interaction. In the course of the last 50 years, both processes have been fruitfully studied. Yet, a complete understanding is still not reached, and the accuracy and the efficiency of the approaches for studying these problems is not yet optimal. This thesis is devoted to devising physical and statistical methods for recognizing the native state of a protein or a protein complex. The studies will be mostly based on BACH, a knowledge-based potential originally designed for the discrimination of native structures in protein folding problems. BACH method will be analyzed and extended: first, a new method to account for protein-solvent interaction will be presented. Then, we will describe an extension of BACH aimed at assessing the quality of protein complexes in protein-protein interaction problems. Finally, we will present a procedure aimed at predicting the structure of a complex based on a hierarchy of approaches ranging from rigid docking up to molecular dynamics in explicit solvent. The reliability of the approaches we propose will be always benchmarked against a selection of other state-of-the-art scoring functions which obtained good results in CASP and CAPRI competitions

    Graph-based Approaches to Protein Structure- and Function Prediction

    No full text
    corecore