18 research outputs found

    Simulated unbound structures for benchmarking of protein docking in the dockground resource

    Get PDF
    Background Proteins play an important role in biological processes in living organisms. Many protein functions are based on interaction with other proteins. The structural information is important for adequate description of these interactions. Sets of protein structures determined in both bound and unbound states are essential for benchmarking of the docking procedures. However, the number of such proteins in PDB is relatively small. A radical expansion of such sets is possible if the unbound structures are computationally simulated. Results The dockground public resource provides data to improve our understanding of protein–protein interactions and to assist in the development of better tools for structural modeling of protein complexes, such as docking algorithms and scoring functions. A large set of simulated unbound protein structures was generated from the bound structures. The modeling protocol was based on 1 ns Langevin dynamics simulation. The simulated structures were validated on the ensemble of experimentally determined unbound and bound structures. The set is intended for large scale benchmarking of docking algorithms and scoring functions. Conclusions A radical expansion of the unbound protein docking benchmark set was achieved by simulating the unbound structures. The simulated unbound structures were selected according to criteria from systematic comparison of experimentally determined bound and unbound structures. The set is publicly available at http://dockground.compbio.ku.edu

    Protein Model Docking Benchmark 2

    Get PDF
    Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu

    Protein Models: The Grand Challenge of protein docking

    Get PDF
    Characterization of life processes at the molecular level requires structural details of protein–protein interactions (PPIs). The number of experimentally determined protein structures accounts only for a fraction of known proteins. This gap has to be bridged by modeling, typically using experimentally determined structures as templates to model related proteins. The fraction of experimentally determined PPI structures is even smaller than that for the individual proteins, due to a larger number of interactions than the number of individual proteins, and a greater difficulty of crystallizing protein–protein complexes. The approaches to structural modeling of PPI (docking) often have to rely on modeled structures of the interactors, especially in the case of large PPI networks. Structures of modeled proteins are typically less accurate than the ones determined by X-ray crystallography or nuclear magnetic resonance. Thus the utility of approaches to dock these structures should be assessed by thorough benchmarking, specifically designed for protein models. To be credible, such benchmarking has to be based on carefully curated sets of structures with levels of distortion typical for modeled proteins. This article presents such a suite of models built for the benchmark set of the X-ray structures from the Dockground resource (http://dockground.bioinformatics.ku.edu) by a combination of homology modeling and Nudged Elastic Band method. For each monomer, six models were generated with predefined Cα root mean square deviation from the native structure (1, 2, . . ., 6 Å). The sets and the accompanying data provide a comprehensive resource for the development of docking methodology for modeled proteins

    Development of protein-protein docking methodology and benchmarking environment

    Get PDF
    Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. That fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of interactome. Yet it remains so far largely untested in a systematic way. This work presents development of comprehensive docking benchmark sets of protein models, and systematic validation of state-of-the-art docking methodologies on these sets. Thorough analysis of template-based and template-free docking performance reveals that even highly inaccurate protein models yield meaningful docking predictions. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy; the template-based docking is much less sensitive to inaccuracies of protein models than the free docking; and docking can be successfully applied to entire proteomes where most proteins are models of different accuracy

    MODELING PROTEIN INTERACTIONS THROUGH STRUCTURE ALIGNMENT

    Get PDF
    Rapid accumulation of the experimental data on protein-protein complexes drives the paradigm shift in protein docking from "traditional" template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for ~ 20% of known protein-protein interactions. When homologous templates for the target complex are not available, but the structure of the target monomers is known, docking through structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from the biological units. The success of the structure alignment of the interfaces depends on the way the interface is defined in terms of its structural content. We performed a systematic large-scale study to find the optimal definition/size of the interface for the structure alignment-based docking applications. The performance was the best when the interface was defined with a distance cutoff of 12 Å. The structure alignment protocol was validated, for both full and partial alignment, on the DOCKGROUND benchmark sets. Both protocols performed equally for higher-accuracy models (i-RMSD &le 5 Å). Overall, the partial structure alignment yielded more acceptable models than the full structure alignment (86 acceptable models were provided by partial structure alignment only, compared to 31 by full structure alignment only). Most templates identified by the partial structure alignment had very low sequence identity to targets and such templates were hard to detect by sequence-based methods. Detailed analysis of the models obtained for 372 test cases concluded that templates for higher-accuracy models often shared not only local but also global structural similarity with the targets. However, interface similarity even in these cases was more prominent, reflected in more accurate models yielded by partial structure alignment. Conservation of protein-protein interfaces was observed in very diverse proteins. For example, target complexes shared interface structural similarity not only with hetero- and homo-complexes but also, in few cases, with crystal packing interfaces. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structure alignment and template free docking success rate significantly surpassing that of the free docking alone

    Text Mining for Protein-Protein Docking

    Get PDF
    Scientific publications are a rich but underutilized source of structural and functional information on proteins and protein interactions. Although scientific literature is intended for human audience, text mining makes it amenable to algorithmic processing. It can focus on extracting information relevant to protein binding modes, providing specific residues that are likely be at the binding site for a given pair of proteins. The knowledge of such residues is a powerful guide for the structural modeling of protein-protein complexes. This work combines and extends two well-established areas of research: the non-structural identification of protein-protein interactors, and structure-based detection of functional (small-ligand) sites on proteins. Text-mining based constraints for protein-protein docking is a unique research direction, which has not been explored prior to this study. Although text mining by itself is unlikely to produce docked models, it is useful in scoring of the docking predictions. Our results show that despite presence of false positives, text mining significantly improves the docking quality. To purge false positives in the mined residues, along with the basic text-mining, this work explores enhanced text mining techniques, using various language processing tools, from simple dictionaries, to WordNet (a generic word ontology), parse trees, word vectors and deep recursive neural networks. The results significantly increase confidence in the generated docking constraints and provide guidelines for the future development of this modeling approach. With the rapid growth of the body of publicly available biomedical literature, and new evolving text-mining methodologies, the approach will become more powerful and adequate to the needs of biomedical community

    STRUCTURAL MODELING OF PROTEIN-PROTEIN INTERACTIONS USING MULTIPLE-CHAIN THREADING AND FRAGMENT ASSEMBLY

    Get PDF
    Since its birth, the study of protein structures has made progress with leaps and bounds. However, owing to the expenses and difficulties involved, the number of protein structures has not been able to catch up with the number of protein sequences and in fact has steadily lost ground. This necessitated the development of high-throughput but accurate computational algorithms capable of predicting the three dimensional structure of proteins from its amino acid sequence. While progress has been made in the realm of protein tertiary structure prediction, the advancement in protein quaternary structure prediction has been limited by the fact that the degree of freedom for protein complexes is even larger and even fewer number of protein complex structures are present in the PDB library. In fact, protein complex structure prediction till date has largely remained a docking problem where automated algorithms aim to predict the protein complex structure starting from the unbound crystal structure of its component subunits and thus has remained largely limited in terms of scope. Secondly, since docking essentially treats the unbound subunits as "rigid-bodies" it has limited accuracy when conformational change accompanies protein-protein interaction. In one of the first of its kind effort, this study aims for the development of protein complex structure algorithms which require only the amino acid sequence of the interacting subunits as input. The study aimed to adapt the best features of protein tertiary structure prediction including template detection and ab initio loop modeling and extend it for protein-protein complexes thus requiring simultaneous modeling of the three dimensional structure of the component subunits as well as ensuring the correct orientation of the chains at the protein-protein interface. Essentially, the algorithms are dependent on knowledge-based statistical potentials for both fold recognition and structure modeling. First, as a way to compare known structure of protein-protein complexes, a complex structure alignment program MM-align was developed. MM-align joins the chains of the complex structures to be aligned to form artificial monomers in every possible order. It then aligns them using a heuristic dynamic programming based approach using TM-score as the objective function. However, the traditional NW dynamic programming was redesigned to prevent the cross alignment of chains during the structure alignment process. Driven by the knowledge obtained from MM-align that protein complex structures share evolutionary relationships and the current protein complex structure library already contains homologous/structurally analogous protein quaternary structure families, a dimeric threading approach, COTH was designed. The new threading-recombination approach boosts the protein complex structure library by combining tertiary structure templates with complex alignments. The query sequences are first aligned to complex templates using the modified dynamic programming algorithm, guided by a number of predicted structural features including ab initio binding-site predictions. Finally, a template-based complex structure prediction approach, TACOS, was designed to build full-length protein complex structures starting from the initial templates identified by COTH. TACOS, fragments the templates aligned regions of templates and reassembles them while building the structure of the threading unaligned region ab inito using a replica-exchange monte-carlo simulation procedure. Simultaneously, TACOS also searches for the best orientation match of the component structures driven by a number of knowledge-based potential terms. Overall, TACOS presents the one of the first approach capable of predicting full length protein complex structures from sequence alone and introduces a new paradigm in the field of protein complex structure modeling

    A domain based protein structural modelling platform applied in the analysis of alternative splicing

    Get PDF
    Functional families (FunFams) are a sub-classification of CATH protein domain superfamilies that cluster relatives likely to have very similar structures and functions. The functional purity of FunFams has been demonstrated by comparing against experimentally determined Enzyme Commission annotations and by checking whether known functional sites coincide with highly conserved residues in the multiple sequence alignments of FunFams. We hypothesised that clustering relatives into FunFams may help in protein structure modelling. In the first work chapter, we demonstrate the structural coherence of domains in FunFams. We then explore the usage of FunFams in protein monomer modelling. The FunFam based protocol produced higher percentages of good models compared to an HHsearch (the state-of-the-art HMM based sequence search tool) based protocol for both close and remote homologs. We developed a modelling pipeline that, utilises the FunFam protocol, and is able to model up to 70% of domain sequences from human and fly genomes. In the second work chapter, we explore the usage of FunFams in protein complex modelling. Our analysis demonstrated that domain-domain interfaces in FunFams tend to be conserved. The FunFam based complex modelling protocol produced significantly more good quality models when compared to a BLAST based protocol and slightly better than a HHsearch based protocol. In the final work chapter, we employ the FunFam based structural modelling tool to understand the implications of alternative splicing. We focused on isoforms derived from mutually exclusively exons (MXEs) for which there is more enriched in proteomics data. MXEs which could be mapped to structure show a significant tendency to be exposed to the solvent, are likely to exhibit a significant change in their physiochemical property and to lie close to a known/predicted functional sites. Our results suggest that MXE events may have a number of important roles in cells generally
    corecore