27 research outputs found

    Methods for the Efficient Comparison of Protein Binding Sites and for the Assessment of Protein-Ligand Complexes

    Get PDF
    In the present work, accelerated methods for the comparison of protein binding sites as well as an extended procedure for the assessment of ligand poses in protein binding sites are presented. Protein binding site comparisons are frequently used receptor-based techniques in early stages of the drug development process. Binding sites of other proteins which are similar to the binding site of the target protein can offer hints for possible side effects of a new drug prior to clinical studies. Moreover, binding site comparisons are used as an idea generator for bioisosteric replacements of individual functional groups of the newly developed drug and to unravel the function of hitherto orphan proteins. The structural comparison of binding sites is especially useful when applied on distantly related proteins as a comparison solely based on the amino acid sequence is not sufficient in such cases. Methods for the assessment of ligand poses in protein binding sites are also used in the early phase of drug development within docking programs. These programs are utilized to screen entire libraries of molecules for a possible ligand of a binding site and to furthermore estimate in which conformation the ligand will most likely bind. By employing this information, molecule libraries can be filtered for subsequent affinity assays and molecular structures can be refined with regard to affinity and selectivity

    Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis

    Get PDF
    In this thesis, protein binding sites are considered. To enable the extraction of information from the space of protein binding sites, these binding sites must be mapped onto a mathematical space. This can be done by mapping binding sites onto vectors, graphs or point clouds. To finally enable a structure on the mathematical space, a distance measure is required, which is introduced in this thesis. This distance measure eventually can be used to extract information by means of data mining techniques

    Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

    Get PDF
    The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism. From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable. In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems. Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Beschreibung von Proteinbindetaschen für Funktionsstudien und de Novo-Design und die Entwicklung von Methoden zur funktionellen Klassifizierung von Proteinfamilien

    Get PDF
    Die Analyse der Ähnlichkeitsbeziehungen von Proteinbindetaschen ist das zentrale Thema, das in verschiedenen Variationen in dieser Arbeit bearbeitet wurde.Die grundlegende Annahme hinter dem vorgestellten Ansatz ist, daß die Funktion eines Proteins in starkem Maße von der Gestalt und den physikochemischen Eigenschaften der Bindetasche bestimmt wird. Ähnlichkeiten von Proteinbindetaschen lassen sich demnach dazu nutzen, Rückschlüsse über die Funktion von Proteinen zu erhalten. Außerdem helfen sie in der Identifizierung von Liganden oder Ligandfragmenten, die in verwandte Bindetaschen binden und können so Ideen für das Design von neuen Inhibitoren liefern. Des Weiteren können Ähnlichkeiten in den Bindetaschen nicht Sequenz-verwandter Proteine dazu genutzt werden, mögliche Kreuzreaktivitäten zwischen diesen vorherzusagen. Schließlich bilden sie die Grundlage für eine funktionelle Klassifizierung von Proteinbindetaschen. Die vorgestellte Arbeit baut auf der Methode Cavbase auf, die die geometrische Form und die physikochemischen Eigenschaften von Bindetaschen beschreibt und ähnliche Bereiche in zwei Bindetaschen bestimmt. Die physikochemischen Eigenschaften der Aminosäuren wird durch 3D-Deskriptoren (Pseudozentren) ausgedrückt. Für einige Interaktionseigenschaften der Aminosäuren wurden neue Pseudozentren eingeführt. Ein Vergleich der Bindetaschenrepräsentation mit wissensbasierten Ansätzen wie Superstar und Drugscore hat gezeigt, daß die Eigenschaften der Bindetaschen von Cavbase sehr gut abgebildet werden. Durch die Implementierung der Beschreibung von Bindetaschen durch Bitstrings, konnte die Geschwindigkeit von Bindetaschenvergleichen wesentlich erhöht werden. An verschiedenen Beispielen ist der erfolgreiche Einsatz von Cavbase in der Ähnlichkeitsanalyse und dem Entdecken von verwandten Proteinbindetaschen gezeigt worden. Weiterhin konnten an 26 ausgewählten Proteinen, deren Funktion zum Zeitpunkt der Kristallstrukturbestimmung noch nicht bekannt war (sogenannten hypothetical proteins), die Möglichkeiten und Grenzen des Cavbase Ansatzes gezeigt werden. Cavbase ist in der Lage, funktionelle Ähnlichkeiten mit anderen Proteinen zu entdecken und Ideen für die Funktionsannotierung vorzuschlagen. Ähnlichkeiten in den Bindetaschen von zwei Proteinen können sich in einer unerwünschten Nebenwirkung manifestieren. Eine mögliche Kreuzreaktivität ist bei Sequenz-verwandten Strukturen relativ einfach abzuschätzen. Schwieriger ist die Vorhersage von Kreuzreaktivitäten, wenn die betrachteten Proteine keine Sequenz- und Faltungsmusterhomologie aufweisen. In dieser Arbeit konnte eine im Experiment beobachtete Kreuzreaktivität strukturell verstanden und mit Cavbase nachvollzogen werden . Als zweiten Fall einer möglichen Kreuzreaktivität wurden ähnliche Bereiche in den Bindetaschen von Carboanhydrase und Malatdehydrogenase aufgefunden, die eine im Experiment beobachtbare Kreuzreaktivität strukturell plausibel machen. Traditionelle Methoden, die die Ähnlichkeit zwischen Proteinen oder Proteinbindetaschen bestimmen, vergleichen eine Bindetasche gegen einen großen Datensatz eben solcher und beschränken sich dabei nur auf die Analyse ausgewählter Bindetaschen. In dieser Arbeit wurde der Fokus stattdessen auf eine Clusteranalyse von großen Datensätzen gelegt. Dabei steht jetzt nicht die Ähnlichkeitsbeziehungen einzelner Bindetaschen zueinander im Vordergrund, sondern die gleichzeitige Analyse von Ähnlichkeitsbeziehungen von mehreren Bindetaschen. In dieser Arbeit wurden deshalb Methoden entwickelt, um große Datensätzen von Bindetaschen miteinander vergleichen und Ähnlichkeits- und Clusteranalysen durchführen zu können. Durch den Einsatz heuristischer Filter konnten Bindetaschenvergleichen wesentlich beschleunigt werden. Besonders interessant ist die Analyse von Proteinfamilien. Am Beispiel von zwei pharmazeutisch relevanten Proteinfamilien den alpha-Carboanhydrasen und den Proteinkinasen konnte gezeigt werden, wie sich Ähnlichkeiten und Unterschiede in den Bindetaschen dazu nutzen lassen, um eine funktionelle Klassifizierung dieser Familien aufzubauen. Cavbase ist in der Lage, die betrachteten Bindetaschen auf der Ebene der Proteinsubfamilie zu unterscheiden

    Protein Functional Surfaces: Global Shape Matching and Local Spatial Alignments of Ligand Binding Sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein surfaces comprise only a fraction of the total residues but are the most conserved functional features of proteins. Surfaces performing identical functions are found in proteins absent of any sequence or fold similarity. While biochemical activity can be attributed to a few key residues, the broader surrounding environment plays an equally important role.</p> <p>Results</p> <p>We describe a methodology that attempts to optimize two components, global shape and local physicochemical texture, for evaluating the similarity between a pair of surfaces. Surface shape similarity is assessed using a three-dimensional object recognition algorithm and physicochemical texture similarity is assessed through a spatial alignment of conserved residues between the surfaces. The comparisons are used in tandem to efficiently search the Global Protein Surface Survey (GPSS), a library of annotated surfaces derived from structures in the PDB, for studying evolutionary relationships and uncovering novel similarities between proteins.</p> <p>Conclusion</p> <p>We provide an assessment of our method using library retrieval experiments for identifying functionally homologous surfaces binding different ligands, functionally diverse surfaces binding the same ligand, and binding surfaces of ubiquitous and conformationally flexible ligands. Results using surface similarity to predict function for proteins of unknown function are reported. Additionally, an automated analysis of the ATP binding surface landscape is presented to provide insight into the correlation between surface similarity and function for structures in the PDB and for the subset of protein kinases.</p

    Exact and efficient algorithms for pairwise learning

    Get PDF

    Proteins and their interacting partners: an introduction to protein–ligand binding site prediction methods

    Get PDF
    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein–ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein–ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein–ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems

    eMatchSite: Sequence Order-Independent Structure Alignments of Ligand Binding Pockets in Protein Models

    Get PDF
    © 2014 Michal Brylinski. Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4–9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite

    Concepts to Interfere with Protein-Protein Complex Formations: Data Analysis, Structural Evidence and Strategies for Finding Small Molecule Modulators

    Get PDF
    (1) Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose descriptors explaining the nature of different protein-protein complexes are desirable. In this work, we introduce Epic Protein Interface Classification (EPIC) as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines (SVM), C4.5 Decision Trees, K Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms (GA) to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, we represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors (ACVs), DrugScore pair potential vectors (DPV) and SFCscore descriptor vectors (SDV). We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features. (2) Since protein-protein interactions play pivotal role in the communication on the molecular level in virtually every biological system and process, the search and design for modulators of such interactions is of utmost interest. In recent years many inhibitors for specific protein-protein interactions have been developed, however, in only a few cases, small and druglike molecules are able to interfere the complex formation of proteins. On the other hand, there a several small molecules known to modulate protein-protein interactions by means of stabilizing an already assembled complex. To achieve this goal, a ligand is binding to a pocket, which is located rim-exposed at the interface of the interacting proteins, e.g. as the phytotoxin Fusicoccin, which stabilizes the interaction of plant H+-ATPase and 14-3-3 protein by nearly a factor of 100. To suggest alternative leads, we performed a virtual screening campaign to discover new molecules putatively stabilizing this complex. Furthermore, we screen a dataset of 198 transient recognition protein-protein complexes for cavities, which are located rim-exposed at their interfaces. We provide evidence for high similarity between such rim-exposed cavities and usual ligand accommodating active sites of enzymes. This analysis suggests that rim-exposed cavities at protein-protein interfaces are druggable targets. Therefore, the principle of stabilizing protein-protein interactions seems to be a promising alternative to the approach of the competitive inhibition of such interactions by small molecules. (3) AffinDB is a database of affinity data for structurally resolved protein-ligand complexes from the PDB. It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein-ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code, and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH-value of the measurement, ligand molecular weight, and publication data (author, journal, year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design
    corecore