78 research outputs found
Contact Prediction is Hardest for the Most Informative Contacts, but Improves with the Incorporation of Contact Potentials
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contactsāprecisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such āinformativeā contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement
Tertiary Alphabet for the Observable Protein Structural Universe
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ā¼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequenceāa major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure
Computational approaches for the design and prediction of protein-protein interactions
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biology, 2007.Includes bibliographical references (leaves 167-187).There is a large class of applications in computational structural biology for which atomic-level representation is crucial for understanding the underlying biological phenomena, yet explicit atomic-level modeling is computationally prohibitive. Computational protein design, homology modeling, protein interaction prediction, docking and structure recognition are among these applications. Models that are commonly applied to these problems combine atomic-level representation with assumptions and approximations that make them computationally feasible. In this thesis I focus on several aspects of this type of modeling, analyze its limitations, propose improvements and explore applications to the design and prediction of protein-protein interactions.by Gevorg Grigoryan.Ph.D
Coarse-graining protein energetics in sequence variables
We show that cluster expansions (CE), previously used to model solid-state
materials with binary or ternary configurational disorder, can be extended to
the protein design problem. We present a generalized CE framework, in which
properties such as energy can be unambiguously expanded in the amino-acid
sequence space. The CE coarse grains over nonsequence degrees of freedom (e.g.,
side-chain conformations) and thereby simplifies the problem of designing
proteins, or predicting the compatibility of a sequence with a given structure,
by many orders of magnitude. The CE is physically transparent, and can be
evaluated through linear regression on the energies of training sequences. We
show, as example, that good prediction accuracy is obtained with up to pairwise
interactions for a coiled-coil backbone, and that triplet interactions are
important in the energetics of a more globular zinc-finger backbone.Comment: 10 pages, 3 figure
Ultra-Fast Evaluation of Protein Energies Directly from Sequence
The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE) method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 10(7) compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1ā4.7 kcal/mol, R(2) = 0.7ā1.0). Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targetsāa coiled coil, a zinc finger, and a WW domaināas functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages, CE is likely to find many uses in computational structural modeling
Recommended from our members
De novo design of a transmembrane ZnĀ²āŗ-transporting four-helix bundle.
The design of functional membrane proteins from first principles represents a grand challenge in chemistry and structural biology. Here, we report the design of a membrane-spanning, four-helical bundle that transports first-row transition metal ions Zn(2+) and Co(2+), but not Ca(2+), across membranes. The conduction path was designed to contain two di-metal binding sites that bind with negative cooperativity. X-ray crystallography and solid-state and solution nuclear magnetic resonance indicate that the overall helical bundle is formed from two tightly interacting pairs of helices, which form individual domains that interact weakly along a more dynamic interface. Vesicle flux experiments show that as Zn(2+) ions diffuse down their concentration gradients, protons are antiported. These experiments illustrate the feasibility of designing membrane proteins with predefined structural and dynamic properties
Structural analysis of cross Ī±-helical nanotubes provides insight into the designability of filamentous peptide nanomaterials
The exquisite structure-function correlations observed in filamentous protein assemblies provide a paradigm for the design of synthetic peptide-based nanomaterials. However, the plasticity of quaternary structure in sequence-space and the lability of helical symmetry present significant challenges to the de novo design and structural analysis of such filaments. Here, we describe a rational approach to design self-assembling peptide nanotubes based on controlling lateral interactions between protofilaments having an unusual cross-Ī± supramolecular architecture. Near-atomic resolution cryo-EM structural analysis of seven designed nanotubes provides insight into the designability of interfaces within these synthetic peptide assemblies and identifies a non-native structural interaction based on a pair of arginine residues. This arginine clasp motif can robustly mediate cohesive interactions between protofilaments within the cross-Ī± nanotubes. The structure of the resultant assemblies can be controlled through the sequence and length of the peptide subunits, which generates synthetic peptide filaments of similar dimensions to flagella and pili
Protein-Directed Self-Assembly of a Fullerene Crystal
Learning to engineer self-assembly would enable the precise organization of molecules by design to create matter with tailored properties. Here we demonstrate that proteins can direct the self-assembly of buckminsterfullerene (C 60) into ordered superstructures. A previously engineered tetrameric helical bundle binds C 60 in solution, rendering it water soluble. Two tetramers associate with one C 60, promoting further organization revealed in a 1.67-Ć
crystal structure. Fullerene groups occupy periodic lattice sites, sandwiched between two Tyr residues from adjacent tetramers. Strikingly, the assembly exhibits high charge conductance, whereas both the protein-alone crystal and amorphous C 60 are electrically insulating. The affinity of C 60 for its crystal-binding site is estimated to be in the nanomolar range, with lattices of known protein crystals geometrically compatible with incorporating the motif. Taken together, these findings suggest a new means of organizing fullerene molecules into a rich variety of lattices to generate new properties by design
- ā¦