1,026 research outputs found

    Superimposé: a 3D structural superposition server

    Get PDF
    The Superimposé webserver performs structural similarity searches with a preference towards 3D structure-based methods. Similarities can be detected between small molecules (e.g. drugs), parts of large structures (e.g. binding sites of proteins) and entire proteins. For this purpose, a number of algorithms were implemented and various databases are provided. Superimposé assists the user regarding the selection of a suitable combination of algorithm and database. After the computation on our server infrastructure, a visual assessment of the results is provided. The structure-based in silico screening for similar drug-like compounds enables the detection of scaffold-hoppers with putatively similar effects. The possibility to find similar binding sites can be of special interest in the functional analysis of proteins. The search for structurally similar proteins allows the detection of similar folds with different backbone topology. The Superimposé server is available at: http://bioinformatics.charite.de/superimpose

    De Novo Proteins Designed From Evolutionary Principles

    Get PDF
    Protein engineering has rapidly developed into a powerful method for the optimization, alteration, and creation of protein functions. Current protein engineering methods fall into the category of either high-throughput directed evolution techniques, or engineering through the use of computational models of protein structure. Despite significant innovation in both of these categories, neither is capable of handling the most difficult and desirable protein engineering goals. The combination of these two categories is an area of active research, and the development and testing of combination methods is the focus of this dissertation. Chapters 2 and 3 describe the development of a computational framework for de novo protein design called SEWING (Structural Extension WIth Native-fragment Graphs). In contrast to existing methods of de novo design, which attempt to design proteins that match a designer-supplied target topology, SEWING generates large numbers of diverse protein structures. We show that this strategy is highly effective at creating diverse helical backbones. Experimental characterization of SEWING designs shows that the experimental structures match the design models with sub-angstrom root mean square deviation (RMSD). Chapter 3 extends this methodology to the creation of protein interfaces. Using this method, several de novo designed proteins are created that bind their designated target. Chapter 4 describes the combination of directed evolution and computational modeling through the improvement of directed evolution techniques. In this chapter, a web tool called SwiftLib is developed, which allows rapid generation of degenerate codon libraries. SwiftLib allows protein engineers to determine optimal degenerate codon primers for the incorporation of desired sequences, such as sequence profiles generated from computational modeling and evolutionary data. Together, these chapters outline the creation of tools for the engineering of protein functions, and provide additional evidence that computational modeling and evolutionary principles can be combined for the improvement of protein engineering methods.Doctor of Philosoph

    Structure-based prediction of protein-protein interaction sites

    Get PDF
    Protein-protein interactions play a central role in the formation of protein complexes and the biological pathways that orchestrate virtually all cellular processes. Reliable identification of the specific amino acid residues that form the interface of a protein with one or more other proteins is critical to understanding the structural and physico-chemical basis of protein interactions and their role in key cellular processes, predicting protein complexes, validating protein interactions predicted by high throughput methods, and identifying and prioritizing drug targets in computational drug design. Because of the difficulty and the high cost of experimental characterization of interface residues, there is an urgent need for computational methods for reliable predicting protein-protein interface residues from the sequence, and when available, the structure of a query protein, and when known, its putative interacting partner. Against this background, this thesis develops improved methods for predicting protein-protein interface residues and protein-protein interfaces from the three dimensional structure of an unbound query protein without considering information of its binding protein partner. Towards this end, we develop (i) ProtInDb (http://protindb.cs.iastate.edu), a database of protein-protein interface residues to facilitate (a) the generation of datasets of protein-protein interface residues that can be used to perform analysis of interaction sites and to train and evaluate predictors of interface residues, and (b) the visualization of interaction sites between proteins in both the amino acid sequences and the 3D protein structures, among other applications; (ii) PoInterS (http://pointers.cs.iastate.edu/), a method for predicting protein-protein interaction sites formed by spatially contiguous clusters of interface residues based on the predictions generated by a protein interface residue predictor. PoInterS divides a protein surface into a series of patches composed of several surface residues, and uses the outputs of the interface residue predictors to rank and select a small set of patches that are the most likely to constitute the interaction sites; and (iii) PrISE (http://prise.cs.iastate.edu/), a method for predicting protein-protein interface residues based on the similarity of the structural element formed by the query residue and its neighboring residues and the structural elements extracted from the interface and non-interface regions of proteins that are members of experimentally determined protein complexes. A structural element captures the atomic composition and solvent accessibility of a central residue and its closest neighbors in the protein structure. PrISE decomposes a query protein into a set of structural elements and searches for similar elements in a large set of proteins that belong to one or more experimentally determined complexes. The structural elements that are most similar to each structural element extracted from the query protein are then used to infer whether its central residue is or is not an interface residue. The results of our experiments using a variety of benchmark datasets show that PoInterS and PrISE generally outperform the state-of-the-art structure-based methods for predicting interaction patches and interface residues, respectively

    Connectable Components for Protein Design

    Get PDF
    Protein design requires reusable, trustworthy, and connectable parts in order to scale to complex challenges. The recent explosion of protein structures stored within the Protein Data Bank provides a wealth of small motifs we can harvest, but we still lack tools to combine them into larger proteins. Here I explore two approaches for connecting reusable protein components on two different length scales. On the atomic scale, I build an interactive search engine for connecting chemical fragments together. Protein fragments built using this search engine recapitulate native-like protein assemblies that can be integrated into existing protein scaffolds using backbone search engines such as MaDCaT. On the protein domain scale, I quantitatively dissect structural variations in two-component systems in order to extract general principles for engineering interfacial flexibility between modular four-helix bundles. These bundles exhibit large scissoring motions where helices move towards or away from the bundle axis and these motions propagate across domain boundaries. Together, these two approaches form the beginnings of a multiscale methodology for connecting reusable protein fragments where there is a constant interplay and feedback between design of atomic structure, secondary structure, and tertiary structure. Rapid iteration, visualization, and search glue these diverse length scales together into a cohesive whole

    All-scale structural analysis of biomolecules through dynamical graph partitioning

    Get PDF
    From femtosecond bond vibrations to millisecond domain motions, the dynamics of biomolecules spans a wide range of time and length scales. This hierarchy of overlapping scales links the molecular and biophysical details to key aspects of their functionality. However, the span of scales combined with their intricate coupling rapidly drives atomic simulation methods to their limits, thereby often resulting in the need for coarse-graining techniques which cannot take full account of the biochemical details. To overcome this tradeoff, a graph-theoretical framework inspired by multiscale community detection methods and stochastic processes is here introduced for the analysis of protein and DNA structures. Using biophysical force fields, we propose a general mapping of the 3D atomic coordinates onto an energy-weighted network that includes the physico-chemical details of interatomic bonds and interactions.Making use of a dynamics-based approach for community detection on networks, optimal partitionings of the structure are identified which are biochemically relevant over different scales. The structural organisation of the biomolecule is shown to be recovered bottom-up over the entire range of chemical, biochemical and biologically meaningful scales, directly from the atomic information of the structure, and without any reparameterisation. This methodology is applied and discussed in five proteins and an ensemble of DNA quadruplexes. In each case, multiple conformations associated with different states of the biomolecule or stages of the underlying catalytic reaction are analysed. Experimental observations are shown to be correctly captured, including the functional domains, regions of the protein with coherent dynamics such as rigid clusters, and the spontaneous closure of some enzymes in the absence of substrate. A computational mutational analysis tool is also derived which identifies both known and new residues with a significant impact on ligand binding. In large multimeric structures, the methodology highlights patterns of long range communication taking place between subunits. In the highly dynamic and polymorphic DNA quadruplexes, key structural features for their physical stability and signatures of their unfolding pathway are identified in the static structure.Open Acces

    Analysis of interactions between ribosomal proteins and RNA structural motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One important goal of structural bioinformatics is to recognize and predict the interactions between protein binding sites and RNA. Recently, a comprehensive analysis of ribosomal proteins and their interactions with rRNA has been done. Interesting results emerged from the comparison of r-proteins within the small subunit in <it>T. thermophilus </it>and <it>E. coli</it>, supporting the idea of a core made by both RNA and proteins, conserved by evolution. Recent work showed also that ribosomal RNA is modularly composed. Motifs are generally single-stranded sequences of consecutive nucleotides (ssRNA) with characteristic folding. The role of these motifs in protein-RNA interactions has been so far only sparsely investigated.</p> <p>Results</p> <p>This work explores the role of RNA structural motifs in the interaction of proteins with ribosomal RNA (rRNA). We analyze composition, local geometries and conformation of interface regions involving motifs such as tetraloops, kink turns and single extruded nucleotides. We construct an interaction map of protein binding sites that allows us to identify the common types of shared 3-D physicochemical binding patterns for tetraloops. Furthermore, we investigate the protein binding pockets that accommodate single extruded nucleotides either involved in kink-turns or in arbitrary RNA strands. This analysis reveals a new structural motif, called <it>tripod</it>.</p> <p>It corresponds to small pockets consisting of three aminoacids arranged at the vertices of an almost equilateral triangle. We developed a search procedure for the recognition of tripods, based on an empirical tripod fingerprint.</p> <p>Conclusion</p> <p>A comparative analysis with the overall RNA surface and interfaces shows that contact surfaces involving RNA motifs have distinctive features that may be useful for the recognition and prediction of interactions.</p

    FUNCTION-DRIVEN APPROACHES TO THE DESIGN OF OPTOGENETIC TOOLS

    Get PDF
    Proteins play a wide variety of roles in biology despite being produced from a small set of common subunits; this commonality can be exploited to understand the dynamics by which proteins fold into structures and perform their manifold functions and, subsequently, design new proteins for use both in research and as nanoscale machines in industry. While this design process has classically involved residue-level redesign of existing protein backbones and, more recently, the de novo design of backbones according to geometrical parameters, the increasing complexity of optogenetic photosystems, biosensors, and other mechanisms for making use of proteins with specific functions has established a need for a design protocol that can reconcile their various structural exigencies with the function-specific elements of as wide an array of proteins as possible in order to make best use of them. Requirement-driven design eschews specific structural templates in favor of general requirements dependent on the intended function of the design, and so can exploit the vastness of protein structural space in finding solutions to increasingly complex design problems. Here, we present three new advances in the requirement-driven design of proteins as diagnostic tools, including a more general photosystem for the direct optogenetic control of protein-protein interactions, a series of algorithmic improvements to the leading implementation of requirement-driven design in the Rosetta macromolecular design software suite, and a new version of that algorithm capable of performing requirement-driven backbone design and residue-level backbone optimization simultaneously. These technologies collectively represent a significant improvement in our ability to control the activity of proteins with a wide variety of control schemes and produce functional proteins for arbitrary requirement sets more generally.Doctor of Philosoph

    Tertiary Alphabet for the Observable Protein Structural Universe

    Get PDF
    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure

    Development of computational approaches for structural classification, analysis and prediction of molecular recognition regions in proteins

    Get PDF
    The vast and growing volume of 3D protein structural data stored in the PDB contains abundant information about macromolecular complexes, and hence, data about protein interfaces. Non-covalent contacts between amino acids are the basis of protein interactions, and they are responsible for binding afinity and specificity in biological processes. In addition, water networks in protein interfaces can also complement direct interactions contributing significantly to molecular recognition, although their exact role is still not well understood. It is estimated that protein complexes in the PDB are substantially underrepresented due to their crystallization dificulties. Methods for automatic classifification and description of the protein complexes are essential to study protein interfaces, and to propose putative binding regions. Due to this strong need, several protein-protein interaction databases have been developed. However, most of them do not take into account either protein-peptide complexes, solvent information or a proper classification of the binding regions, which are fundamental components to provide an accurate description of protein interfaces. In the firest stage of my thesis, I developed the SCOWLP platform, a database and web application that structurally classifies protein binding regions at family level and defines accurately protein interfaces at atomic detail. The analysis of the results showed that protein-peptide complexes are substantially represented in the PDB, and are the only source of interacting information for several families. By clustering the family binding regions, I could identify 9,334 binding regions and 79,803 protein interfaces in the PDB. Interestingly, I observed that 65% of protein families interact to other molecules through more than one region and in 22% of the cases the same region recognizes different protein families. The database and web application are open to the research community (www.scowlp.org) and can tremendously facilitate high-throughput comparative analysis of protein binding regions, as well as, individual analysis of protein interfaces. SCOWLP and the other databases collect and classify the protein binding regions at family level, where sequence and structure homology exist. Interestingly, it has been observed that many protein families also present structural resemblances within each other, mostly across folds. Likewise, structurally similar interacting motifs (binding regions) have been identified among proteins with different folds and functions. For these reasons, I decided to explore the possibility to infer protein binding regions independently of their fold classification. Thus, I performed the firest systematic analysis of binding region conservation within all protein families that are structurally similar, calculated using non-sequential structural alignment methods. My results indicate there is a substantial molecular recognition information that could be potentially inferred among proteins beyond family level. I obtained a 6 to 8 fold enrichment of binding regions, and identified putative binding regions for 728 protein families that lack binding information. Within the results, I found out protein complexes from different folds that present similar interfaces, confirming the predictive usage of the methodology. The data obtained with my approach may complement the SCOWLP family binding regions suggesting alternative binding regions, and can be used to assist protein-protein docking experiments and facilitate rational ligand design. In the last part of my thesis, I used the interacting information contained in the SCOWLP database to help understand the role that water plays in protein interactions in terms of affinity and specificity. I carried out one of the firest high-throughput analysis of solvent in protein interfaces for a curated dataset of transient and obligate protein complexes. Surprisingly, the results highlight the abundance of water-bridged residues in protein interfaces (40.1% of the interfacial residues) that reinforces the importance of including solvent in protein interaction studies (14.5% extra residues interacting only water- mediated). Interestingly, I also observed that obligate and transient interfaces present a comparable amount of solvent, which contrasts the old thoughts saying that obligate protein complexes are expected to exhibit similarities to protein cores having a dry and hydrophobic interfaces. I characterized novel features of water-bridged residues in terms of secondary structure, temperature factors, residue composition, and pairing preferences that differed from direct residue-residue interactions. The results also showed relevant aspects in the mobility and energetics of water-bridged interfacial residues. Collectively, my doctoral thesis work can be summarized in the following points: 1. I developed SCOWLP, an improved framework that identiffies protein interfaces and classifies protein binding regions at family level. 2. I developed a novel methodology to predict alternative binding regions among structurally similar protein families independently of the fold they belong to. 3. I performed a high-throughput analysis of water-bridged interactions contained in SCOWLP to study the role of solvent in protein interfaces. These three components of my thesis represent novel methods for exploiting existing structural information to gain insights into protein- protein interactions, key mechanisms to understand biological processes

    Three-dimensional Structure Databases of Biological Macromolecules

    Get PDF
    Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a)Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results.(b)Information-retrieval tools to allow searching to identify entries of interest and provide access to them.(c)Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction.(d)Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and “helpdesks,” as well as links to external software. Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains
    corecore