1,556 research outputs found

    Computation of protein geometry and its applications: Packing and function prediction

    Full text link
    This chapter discusses geometric models of biomolecules and geometric constructs, including the union of ball model, the weigthed Voronoi diagram, the weighted Delaunay triangulation, and the alpha shapes. These geometric constructs enable fast and analytical computaton of shapes of biomoleculres (including features such as voids and pockets) and metric properties (such as area and volume). The algorithms of Delaunay triangulation, computation of voids and pockets, as well volume/area computation are also described. In addition, applications in packing analysis of protein structures and protein function prediction are also discussed.Comment: 32 pages, 9 figure

    Efficient comprehensive scoring of docked proteincomplexes - a machine learning approach

    Get PDF
    Biological systems and processes rely on a complex network of molecular interactions. The association of biological macromolecules is a fundamental biochemical phenomenon and an unsolved theoretical problem crucial for the understanding of complex living systems. The term protein-protein docking describes the computational prediction of the assembly of protein complexes from the individual subunits. Docking algorithms generally produce a large number of putative protein complexes. In most cases, some of these conformations resemble the native complex structure within an acceptable degree of structural similarity. A major challenge in the field of docking is to extract the near-native structure(s) out of this considerably large pool of solutions, the so called scoring or ranking problem. It has been the aim of this work to develop methods for the efficient and accurate detection of near-native conformations in the scoring or ranking process of docked protein-protein complexes. A series of structural, chemical, biological and physical properties are used in this work to score docked protein-protein complexes. These properties include specialised energy functions, evolutionary relationship, class specific residue interface propensities, gap volume, buried surface area, empiric pair potentials on residue and atom level as well as measures for the tightness of fit. Efficient comprehensive scoring functions have been developed using probabilistic Support Vector Machines in combination with this array of properties on the largest currently available protein-protein docking benchmark. The established scoring functions are shown to be specific for certain types of protein-protein complexes and are able to detect near-native complex conformations from large sets of decoys with high sensitivity. The specific complex classes are Enzyme-Inhibitor/Substrate complexes, Antibody-Antigen complexes and a third class denoted as "Other" complexes which holds all test cases not belonging to either of the two previous classes. The three complex class specific scoring functions were tested on the docking results of 99 complexes in their unbound form for the above mentioned categories. Defining success as scoring a 'true' result with a p-value of better than 0.1, the scoring schemes were found to be successful in 93%, 78% and 63% of the examined cases, respectively. The ranking of near-native structures can be drastically improved, leading to a significant enrichment of near-native complex conformations in the top ranks. It could be shown that the developed scoring schemes outperform five other previously published scoring functions

    Structure-based Prediction of Protein-protein Interaction Networks across Proteomes

    Get PDF
    Protein-protein interactions (PPIs) orchestrate virtually all cellular processes, therefore, their exhaustive exploration is essential for the comprehensive understanding of cellular networks. Significant efforts have been devoted to expand the coverage of the proteome-wide interaction space at molecular level. A number of experimental techniques have been developed to discover PPIs, however these approaches have some limitations such as the high costs and long times of experiments, noisy data sets, and often high false positive rate and inter-study discrepancies. Given experimental limitations, computational methods are increasingly becoming important for detection and structural characterization of PPIs. In that regard, we have developed a novel pipeline for high-throughput PPI prediction based on all-to-all rigid body docking of protein structures. We focus on two questions, ‘how do proteins interact?’ and ‘which proteins interact?’. The method combines molecular modeling, structural bioinformatics, machine learning, and functional annotation data to answer these questions and it can be used for genome-wide molecular reconstruction of protein-protein interaction networks. As a proof of concept, 61,913 protein-protein interactions were confidently predicted and modeled for the proteome of E. coli. Further, we validated our method against a few human pathways. The modeling protocol described in this communication can be applied to detect protein-protein interactions in other organisms as well as to construct dimer structures and estimate the confidence of protein interactions experimentally identified with high-throughput techniques

    Improving the resolution of interaction maps: A middleground between high-resolution complexes and genome-wide interactomes

    Get PDF
    Protein-protein interactions are ubiquitous in Biology and therefore central to understand living organisms. In recent years, large-scale studies have been undertaken to describe, at least partially, protein-protein interaction maps or interactomes for a number of relevant organisms including human. Although the analysis of interaction networks is proving useful, current interactomes provide a blurry and granular picture of the molecular machinery, i.e. unless the structure of the protein complex is known the molecular details of the interaction are missing and sometime is even not possible to know if the interaction between the proteins is direct, i.e. physical interaction or part of functional, not necessary, direct association. Unfortunately, the determination of the structure of protein complexes cannot keep pace with the discovery of new protein-protein interactions resulting in a large, and increasing, gap between the number of complexes that are thought to exist and the number for which 3D structures are available. The aim of the thesis was to tackle this problem by implementing computational approaches to derive structural models of protein complexes and thus reduce this existing gap. Over the course of the thesis, a novel modelling algorithm to predict the structure of protein complexes, V-D2OCK, was implemented. This new algorithm combines structure-based prediction of protein binding sites by means of a novel algorithm developed over the course of the thesis: VORFFIP and M-VORFFIP, data-driven docking and energy minimization. This algorithm was used to improve the coverage and structural content of the human interactome compiled from different sources of interactomic data to ensure the most comprehensive interactome. Finally, the human interactome and structural models were compiled in a database, V-D2OCK DB, that offers an easy and user-friendly access to the human interactome including a bespoken graphical molecular viewer to facilitate the analysis of the structural models of protein complexes. Furthermore, new organisms, in addition to human, were included providing a useful resource for the study of all known interactomes

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    A NEW METHODOLOGY FOR IDENTIFYING INTERFACE RESIDUES INVOLVED IN BINDING PROTEIN COMPLEXES

    Get PDF
    Genome-sequencing projects with advanced technologies have rapidly increased the amount of protein sequences, and demands for identifying protein interaction sites are significantly increased due to its impact on understanding cellular process, biochemical events and drug design studies. However, the capacity of current wet laboratory techniques is not enough to handle the exponentially growing protein sequence data; therefore, sequence based predictive methods identifying protein interaction sites have drawn increasing interest. In this article, a new predictive model which can be valuable as a first approach for guiding experimental methods investigating protein-protein interactions and localizing the specific interface residues is proposed. The proposed method extracts a wide range of features from protein sequences. Random forests framework is newly redesigned to effectively utilize these features and the problems of imbalanced data classification commonly encountered in binding site predictions. The method is evaluated with 2,829 interface residues and 24,616 non-interface residues extracted from 99 polypeptide chains in the Protein Data Bank. The experimental results show that the proposed method performs significantly better than two other conventional predictive methods and can reliably predict residues involved in protein interaction sites. As blind tests, the proposed method predicts interaction sites and constructs three protein complexes: the DnaK molecular chaperone system, 1YUW and 1DKG, which provide new insight into the sequence-function relationship. Finally, the robustness of the proposed method is assessed by evaluating the performances obtained from four different ensemble methods

    Storing and analysing biomolecular contacts

    Get PDF
    Biomolecular contacts play a crucial role in all areas of life. In particular, protein-protein (PP) interactions are essential for most processes in biological cells. Antigen-antibody recognition, enzyme substrate binding, hormone receptor binding, RNA splicing, DNA replication and signal transduction are just some examples for the rich variety of PP interactions. In the last years modern proteomic methods have helped to get a better understanding of the complexity within living cell and organism. More and more sequences of unknown proteins are deciphered, their function is revealed, structural details are detected and the interaction in the complex network of biological processes is uncovered. Contacts between proteins and small molecules (PL) describe the second important group for biomolecular contacts and play an essential role for drug design. The vast increase of such information necessitates the application of databases for easy handling and analysis of data. We created a database covering PP as well as PL interactions for which structural data are available. Using the database, we performed a number of analyses concerning features of protein-protein complexes, in particular the group of obligate and non-obligate interactions. Combining information from PP and PL complexes, we generated a prediction method for binding sites of small molecules on PP interface sites. Finally, we tested the applicability of features of PP interactions for the prediction of their kinetic parameters.Kontakte zwischen Biomolekülen spielen eine wichtige Rolle in allen Bereichen des lebenden Organismus. Insbesondere Protein-Protein Interaktionen (PP sind für die meisten Prozesse innerhalb der Zelle erforderlich. Die Antigen-Antikörper Erkennung, Enzym-Substrat Bindung, Bindung eines Hormons an seinen Rezeptor, Spleißen von RNA, DNA Replikation und Signaltransduktion sind nur einige Beispiele für die große Vielfalt an PP Interaktionen. In den vergangenen Jahren haben moderne Methoden aus dem Bereich der Proteomik dazu beigetragen, das Verständnis der Komplexität innerhalb der lebenden Zelle und des Organismus zu verbessern. Die Zahl an entschlüsselten Proteinsequenzen steigt beständig an, ihre Funktion wird erfaßt, strukturelle Details werden aufgedeckt und ihr Beitrag im Netzwerk der biologischen Prozesse wird durchleuchtet. Kontakte zwischen Proteinen und kleinen Molekülen bzw. Liganden (PL) stellen die zweite wichtige Gruppe an biomolekularen Kontakten dar und spielen eine wesentliche Rolle für die Entwicklung neuer Arzneistoffe. Der immense Anstieg derartiger Informationen erfordert den Einsatz von Datenbanken zur einfachen Handhabung und Auswertung der Daten. Wir erstellten eine Datenbank, die sowohl PP als auch PL Interaktionen umfasst und für die strukturelle Informationen vorhanden sind. Unter Zuhilfenahme dieser Datenbank führten wir Analysen zu Eigenschaften von Protein-Protein Komplexen, insbesondere zur Gruppe der obligaten und nicht-obliagate Interaktionen, durch. Indem wir Informationen aus PP und PL Komplexen miteinander kombinierten, schufen wir eine Vorhersagemethode für Bindungsstellen von kleinen Molekülen auf PP Oberflächen. Schließlich untersuchten wir physiko-chemische Merkmale von PP Interaktionen zur Vorhersage ihrer kinetischen Parameter

    Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks.

    Get PDF
    International audienceStructure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the "structurally variable" regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of 'variable' regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure

    EL_PSSM-RT:DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation

    Get PDF
    Background: Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues. Results: In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues. Conclusions: We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community
    corecore