10 research outputs found

    PepX: a structural database of non-redundant protein–peptide complexes

    Get PDF
    Although protein–peptide interactions are estimated to constitute up to 40% of all protein interactions, relatively little information is available for the structural details of these interactions. Peptide-mediated interactions are a prime target for drug design because they are predominantly present in signaling and regulatory networks. A reliable data set of nonredundant protein–peptide complexes is indispensable as a basis for modeling and design, but current data sets for protein–peptide interactions are often biased towards specific types of interactions or are limited to interactions with small ligands. In PepX (http://pepx.switchlab.org), we have designed an unbiased and exhaustive data set of all protein–peptide complexes available in the Protein Data Bank with peptide lengths up to 35 residues. In addition, these complexes have been clustered based on their binding interfaces rather than sequence homology, providing a set of structurally diverse protein–peptide interactions. The final data set contains 505 unique protein–peptide interface clusters from 1431 complexes. Thorough annotation of each complex with both biological and structural information facilitates searching for and browsing through individual complexes and clusters. Moreover, we provide an additional source of data for peptide design by annotating peptides with naturally occurring backbone variations using fragment clusters from the BriX database

    BriX: a database of protein building blocks for structural analysis, modeling and design

    Get PDF
    High-resolution structures of proteins remain the most valuable source for understanding their function in the cell and provide leads for drug design. Since the availability of sufficient protein structures to tackle complex problems such as modeling backbone moves or docking remains a problem, alternative approaches using small, recurrent protein fragments have been employed. Here we present two databases that provide a vast resource for implementing such fragment-based strategies. The BriX database contains fragments from over 7000 non-homologous proteins from the Astral collection, segmented in lengths from 4 to 14 residues and clustered according to structural similarity, summing up to a content of 2 million fragments per length. To overcome the lack of loops classified in BriX, we constructed the Loop BriX database of non-regular structure elements, clustered according to end-to-end distance between the regular residues flanking the loop. Both databases are available online (http://brix.crg.es) and can be accessed through a user-friendly web-interface. For high-throughput queries a web-based API is provided, as well as full database downloads. In addition, two exciting applications are provided as online services: (i) user-submitted structures can be covered on the fly with BriX classes, representing putative structural variation throughout the protein and (ii) gaps or low-confidence regions in these structures can be bridged with matching fragments

    Analysis of conformational variation in macromolecular structural models

    Get PDF
    Experimental conditions or the presence of interacting components can lead to variations in the structural models of macromolecules. However, the role of these factors in conformational selection is often omitted by in silico methods to extract dynamic information from protein structural models. Structures of small peptides, considered building blocks for larger macromolecular structural models, can substantially differ in the context of a larger protein. This limitation is more evident in the case of modeling large multi-subunit macromolecular complexes using structures of the individual protein components. Here we report an analysis of variations in structural models of proteins with high sequence similarity. These models were analyzed for sequence features of the protein, the role of scaffolding segments including interacting proteins or affinity tags and the chemical components in the experimental conditions. Conformational features in these structural models could be rationalized by conformational selection events, perhaps induced by experimental conditions. This analysis was performed on a non-redundant dataset of protein structures from different SCOP classes. The sequence-conformation correlations that we note here suggest additional features that could be incorporated by in silico methods to extract dynamic information from protein structural models

    Toward optimal fragment generations for ab initio protein structure assembly

    Full text link
    Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein‐structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless‐threading method to generate position‐specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine‐learning‐based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/96355/1/24179_ftp.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/96355/2/PROT_24179_sm_SuppInfo.pd

    Tertiary Alphabet for the Observable Protein Structural Universe

    Get PDF
    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure

    : Protein Long Local Structure Prediction

    Get PDF
    International audienceA relevant and accurate description of three-dimensional (3D) protein structures can be achieved by characterizing recurrent local structures. In a previous study, we developed a library of 120 3D structural prototypes encompassing all known 11-residues long local protein structures and ensuring a good quality of structural approximation. A local structure prediction method was also proposed. Here, overlapping properties of local protein structures in global ones are taken into account to characterize frequent local networks. At the same time, we propose a new long local structure prediction strategy which involves the use of evolutionary information coupled with Support Vector Machines (SVMs). Our prediction is evaluated by a stringent geometrical assessment. Every local structure prediction with a Calpha RMSD less than 2.5 A from the true local structure is considered as correct. A global prediction rate of 63.1% is then reached, corresponding to an improvement of 7.7 points compared with the previous strategy. In the same way, the prediction of 88.33% of the 120 structural classes is improved with 8.65% mean gain. 85.33% of proteins have better prediction results with a 9.43% average gain. An analysis of prediction rate per local network also supports the global improvement and gives insights into the potential of our method for predicting super local structures. Moreover, a confidence index for the direct estimation of prediction quality is proposed. Finally, our method is proved to be very competitive with cutting-edge strategies encompassing three categories of local structure predictions. Proteins 2009. (c) 2009 Wiley-Liss, Inc

    In silico characterisation of the four canonical plasmodium falciparum 70 kDa heat shock proteins

    Get PDF
    The 70 kDa heat shock proteins expressed by Plasmodium falciparum (PfHsp70s) are believed to be essential to both the survival and virulence of the malaria parasite. A total of six Hsp70 genes have been identified in the genome of P. falciparum. However, only four of these encode canonical Hsp70s, which are believed to localise predominantly in the cytosol (PfHsp70-1 and PfHsp70-x), the endoplasmic reticulum (PfHsp70-2) and mitochondria (PfHsp70-3) of the parasite. These proteins bind and release peptide substrates in an ATP-dependent manner, with the aid of a J-domain protein cochaperone and a nucleotide exchange factor (NEF). The aim of this study was to identify the residues involved in the interaction of these PfHsp70s with their peptide substrates, their J-domain cochaperones and potential NEFs. These residues were then mapped to three-dimensional (3D) structures of the proteins, modelled in three different conformations; each representing a different stage in the ATPase cycle. Additionally, these proteins were compared to different types of Hsp70s from a variety of different organisms and sequence features found to be specific to each PfHsp70 were mapped to their 3D structures. Finally, a novel modelling method was suggested, in which the structures of templates were remodelled to improve their quality before they were used in the homology modelling process. Based on the analysis of residues involved in interactions with other proteins, it was revealed that each PfHsp70 displayed features that were specific to its cellular localisation and each type of Hsp70 was predicted to interact with a different set of NEFs. The study of conserved features in each PfHsp70 revealed that PfHsp70-x displayed various sequence features atypical of both Plasmodium cytosolic Hsp70s and cytosolic Hsp70s in general. Additionally, residues conserved specifically in Hsp70s of Apicomplexa, Plasmodium and P. falciparum were identified and mapped to the each PfHsp70 model. Although these residues were too numerous to reveal any information of specific value, these models may be useful for the purposes of aiding the design of drug compounds against each PfHsp70. Finally, the novel modelling approach did show some promise. Half of the models produced using the modified templates were of a higher quality than their counterparts modelled using the original templates. This approach does still require a lot of validation work and statistical evaluation. It is hoped that it could prove to be a useful approach to homology modelling when the only templates available are poor quality structures

    Secondary structure-based template selection for fragment-assembly protein structure prediction

    Get PDF
    Proteins play critical biochemical roles in all living organisms; in human beings, they are the targets of 50% of all drugs. Although the first protein structure was determined 60 years ago, experimental techniques are still time and cost consuming. Consequently, in silico protein structure prediction, which is considered a main challenge in computational biology, is fundamental to decipher conformations of protein targets. This thesis contributes to the state of the art of fragment-assembly protein structure prediction. This category has been widely and thoroughly studied due to its application to any type of targets. While the majority of research focuses on enhancing the functions that are used to score fragments by incorporating new terms and optimising their weights, another important issue is how to pick appropriate fragments from a large pool of candidate structures. Since prediction of the main structural classes, i.e. mainly-alpha, mainly-beta and alpha-beta, has recently reached quite a high level of accuracy, we have introduced a novel approach by decreasing the size of the pool of candidate structures to comprise only proteins that share the same structural class a target is likely to adopt. Picking fragments from this customised set of known structures not only has contributed in generating decoys with higher level of accuracy but also has eliminated irrelevant parts of the search space which makes the selection of first models a less complicated process, addressing the inaccuracies of energy functions. In addition to the challenge of adopting a unique template structure for all targets, another one arises whenever relying on the same amount of corrections and fine tunings; such a phase may be damaging to “easy’ targets, i.e. those that comprise a relatively significant percentage of alpha helices. Owing to the sequence-structure correlation based on which fragment-based protein structure prediction was born, we have also proposed a customised phase of correction based on the structural class prediction of the target in question. After using secondary structure prediction as a “global feature” of a target, i.e. structural classes, we have also investigated its usage as a “local feature” to customise the number of candidate fragments, which is currently the same at all positions. Relying on the known facts regarding diversity of short fragments of helices, sheets and loops, the fragment insertion process has been adjusted to make “changes” relative to the expected complexity of each region. We have proved in this thesis the extent to which secondary structure features can be used implicitly or explicitly to enhance fragment assembly protein structure prediction
    corecore