69 research outputs found

    PocketPicker: analysis of ligand binding-sites with shape descriptors

    Get PDF
    Background Identification and evaluation of surface binding-pockets and occluded cavities are initial steps in protein structure-based drug design. Characterizing the active site's shape as well as the distribution of surrounding residues plays an important role for a variety of applications such as automated ligand docking or in situ modeling. Comparing the shape similarity of binding site geometries of related proteins provides further insights into the mechanisms of ligand binding. Results We present PocketPicker, an automated grid-based technique for the prediction of protein binding pockets that specifies the shape of a potential binding-site with regard to its buriedness. The method was applied to a representative set of protein-ligand complexes and their corresponding apo-protein structures to evaluate the quality of binding-site predictions. The performance of the pocket detection routine was compared to results achieved with the existing methods CAST, LIGSITE, LIGSITEcs, PASS and SURFNET. Success rates PocketPicker were comparable to those of LIGSITEcs and outperformed the other tools. We introduce a descriptor that translates the arrangement of grid points delineating a detected binding-site into a correlation vector. We show that this shape descriptor is suited for comparative analyses of similar binding-site geometry by examining induced-fit phenomena in aldose reductase. This new method uses information derived from calculations of the buriedness of potential binding-sites. Conclusions The pocket prediction routine of PocketPicker is a useful tool for identification of potential protein binding-pockets. It produces a convenient representation of binding-site shapes including an intuitive description of their accessibility. The shape-descriptor for automated classification of binding-site geometries can be used as an additional tool complementing elaborate manual inspections

    Computation of protein geometry and its applications: Packing and function prediction

    Full text link
    This chapter discusses geometric models of biomolecules and geometric constructs, including the union of ball model, the weigthed Voronoi diagram, the weighted Delaunay triangulation, and the alpha shapes. These geometric constructs enable fast and analytical computaton of shapes of biomoleculres (including features such as voids and pockets) and metric properties (such as area and volume). The algorithms of Delaunay triangulation, computation of voids and pockets, as well volume/area computation are also described. In addition, applications in packing analysis of protein structures and protein function prediction are also discussed.Comment: 32 pages, 9 figure

    Shelling the Voronoi interface of protein-protein complexes predicts residue activity and conservation

    Get PDF
    The accurate description of protein-protein interfaces remains a challenging task. Traditional criteria, based on atomic contacts or changes in solvent accessibility, tend to over or underpredict the interface itself and cannot discriminate active from less relevant parts. A recent simulation study by Mihalek and co-authors (2007, JMB 369, 584-95) concluded that active residues tend to be `dry', that is, insulated from water fluctuations. We show that patterns of `dry' residues can, to a large extent, be predicted by a fast, parameter-free and purely geometric analysis of protein interfaces. We introduce the shelling order of Voronoi facets as a straightforward quantitative measure of an atom's depth inside an interface. We analyze the correlation between Voronoi shelling order, dryness, and conservation on a set of 54 protein-protein complexes. Residues with high shelling order tend to be dry; evolutionary conservation also correlates with dryness and shelling order but, perhaps not surprisingly, is a much less accurate predictor of either property. Voronoi shelling order thus seems a meaningful and efficient descriptor of protein interfaces. Moreover, the strong correlation with dryness suggests that water dynamics within protein interfaces may, in first approximation, be described by simple diffusion models

    MGOS: A library for molecular geometry and its operating system

    Get PDF
    The geometry of atomic arrangement underpins the structural understanding of molecules in many fields. However, no general framework of mathematical/computational theory for the geometry of atomic arrangement exists. Here we present "Molecular Geometry (MG)'' as a theoretical framework accompanied by "MG Operating System (MGOS)'' which consists of callable functions implementing the MG theory. MG allows researchers to model complicated molecular structure problems in terms of elementary yet standard notions of volume, area, etc. and MGOS frees them from the hard and tedious task of developing/implementing geometric algorithms so that they can focus more on their primary research issues. MG facilitates simpler modeling of molecular structure problems; MGOS functions can be conveniently embedded in application programs for the efficient and accurate solution of geometric queries involving atomic arrangements. The use of MGOS in problems involving spherical entities is akin to the use of math libraries in general purpose programming languages in science and engineering. (C) 2019 The Author(s). Published by Elsevier B.V

    Geometric algorithms for cavity detection on protein surfaces

    Get PDF
    Macromolecular structures such as proteins heavily empower cellular processes or functions. These biological functions result from interactions between proteins and peptides, catalytic substrates, nucleotides or even human-made chemicals. Thus, several interactions can be distinguished: protein-ligand, protein-protein, protein-DNA, and so on. Furthermore, those interactions only happen under chemical- and shapecomplementarity conditions, and usually take place in regions known as binding sites. Typically, a protein consists of four structural levels. The primary structure of a protein is made up of its amino acid sequences (or chains). Its secondary structure essentially comprises -helices and -sheets, which are sub-sequences (or sub-domains) of amino acids of the primary structure. Its tertiary structure results from the composition of sub-domains into domains, which represent the geometric shape of the protein. Finally, the quaternary structure of a protein results from the aggregate of two or more tertiary structures, usually known as a protein complex. This thesis fits in the scope of structure-based drug design and protein docking. Specifically, one addresses the fundamental problem of detecting and identifying protein cavities, which are often seen as tentative binding sites for ligands in protein-ligand interactions. In general, cavity prediction algorithms split into three main categories: energy-based, geometry-based, and evolution-based. Evolutionary methods build upon evolutionary sequence conservation estimates; that is, these methods allow us to detect functional sites through the computation of the evolutionary conservation of the positions of amino acids in proteins. Energy-based methods build upon the computation of interaction energies between protein and ligand atoms. In turn, geometry-based algorithms build upon the analysis of the geometric shape of the protein (i.e., its tertiary structure) to identify cavities. This thesis focuses on geometric methods. We introduce here three new geometric-based algorithms for protein cavity detection. The main contribution of this thesis lies in the use of computer graphics techniques in the analysis and recognition of cavities in proteins, much in the spirit of molecular graphics and modeling. As seen further ahead, these techniques include field-of-view (FoV), voxel ray casting, back-face culling, shape diameter functions, Morse theory, and critical points. The leading idea is to come up with protein shape segmentation, much like we commonly do in mesh segmentation in computer graphics. In practice, protein cavity algorithms are nothing more than segmentation algorithms designed for proteins.Estruturas macromoleculares tais como as proteínas potencializam processos ou funções celulares. Estas funções resultam das interações entre proteínas e peptídeos, substratos catalíticos, nucleótideos, ou até mesmo substâncias químicas produzidas pelo homem. Assim, há vários tipos de interacções: proteína-ligante, proteína-proteína, proteína-DNA e assim por diante. Além disso, estas interações geralmente ocorrem em regiões conhecidas como locais de ligação (binding sites, do inglês) e só acontecem sob condições de complementaridade química e de forma. É também importante referir que uma proteína pode ser estruturada em quatro níveis. A estrutura primária que consiste em sequências de aminoácidos (ou cadeias), a estrutura secundária que compreende essencialmente por hélices e folhas , que são subsequências (ou subdomínios) dos aminoácidos da estrutura primária, a estrutura terciária que resulta da composição de subdomínios em domínios, que por sua vez representa a forma geométrica da proteína, e por fim a estrutura quaternária que é o resultado da agregação de duas ou mais estruturas terciárias. Este último nível estrutural é frequentemente conhecido por um complexo proteico. Esta tese enquadra-se no âmbito da conceção de fármacos baseados em estrutura e no acoplamento de proteínas. Mais especificamente, aborda-se o problema fundamental da deteção e identificação de cavidades que são frequentemente vistos como possíveis locais de ligação (putative binding sites, do inglês) para os seus ligantes (ligands, do inglês). De forma geral, os algoritmos de identificação de cavidades dividem-se em três categorias principais: baseados em energia, geometria ou evolução. Os métodos evolutivos baseiam-se em estimativas de conservação das sequências evolucionárias. Isto é, estes métodos permitem detectar locais funcionais através do cálculo da conservação evolutiva das posições dos aminoácidos das proteínas. Em relação aos métodos baseados em energia estes baseiam-se no cálculo das energias de interação entre átomos da proteína e do ligante. Por fim, os algoritmos geométricos baseiam-se na análise da forma geométrica da proteína para identificar cavidades. Esta tese foca-se nos métodos geométricos. Apresentamos nesta tese três novos algoritmos geométricos para detecção de cavidades em proteínas. A principal contribuição desta tese está no uso de técnicas de computação gráfica na análise e reconhecimento de cavidades em proteínas, muito no espírito da modelação e visualização molecular. Como pode ser visto mais à frente, estas técnicas incluem o field-of-view (FoV), voxel ray casting, back-face culling, funções de diâmetro de forma, a teoria de Morse, e os pontos críticos. A ideia principal é segmentar a proteína, à semelhança do que acontece na segmentação de malhas em computação gráfica. Na prática, os algoritmos de detecção de cavidades não são nada mais que algoritmos de segmentação de proteínas

    Current state-of-the-art of the research conducted in mapping protein cavities – binding sites of bioactive compounds, peptides or other proteins

    Get PDF
    Ο σκοπός της διπλωματικής εργασίας είναι η διερεύνηση και αποτύπωση των ερευνητικών μελετών που αφορούν στον χαρακτηρισμό μιας πρωτεϊνικής κοιλότητας – κέντρου πρόσδεσης βιοδραστικών ενώσεων, πεπτιδίων ή άλλων πρωτεϊνών. Στην παρούσα εργασία χρησιμοποιήθηκε η μέθοδος της βιβλιογραφικής επισκόπησης. Παρουσιάζονται τα κυριότερα ευρήματα προηγούμενων ερευνών που σχετίζονται με τη διαδικασία σχεδιασμού φαρμάκων και τον εντοπισμό φαρμακοφόρων με βάση ένα σύνολο προσδετών. Στη συνέχεια συγκρίνονται διαδικασίες επεξεργασίας και ανάλυσης της πρωτεϊνικής κοιλότητας προγενέστερων ερευνών με τη προσέγγιση που προτάθηκε από τους Παπαθανασίου και Φωτόπουλου το 2015. Αναδεικνύονται βασικά πλεονεκτήματα της προσέγγισης αυτής, όπως η εφαρμογή του αλγορίθμου πολυδιάστατη k-means ομαδοποίηση (multidimensional k-means clustering). Η εύρεση βιβλιογραφίας βασίστηκε σε αναζήτηση επιστημονικών άρθρων σε ξενόγλωσσα επιστημονικά περιοδικά, σε κεφάλαια βιβλίων και σε διάφορα άρθρα σε ηλεκτρονικούς ιστότοπους σχετικά με τον σχεδιασμό φαρμάκων και τις κοιλότητες που απαντώνται στις πρωτεΐνες. Στην παρούσα εργασία παρουσιάζονται εν συντομία εργαλεία που εντοπίστηκαν χρησιμοποιώντας λέξεις κλειδιά όπως για παράδειγμα δυναμική πρωτεϊνικής κοιλότητας, καταλυτικό κέντρο ενός ενζύμου, πρόσδεση, πρωτεϊνική θήκη κλπ. Στη συνέχεια συγκροτήθηκε κατάλογος με τα εργαλεία βιοπληροφορικής ανάλυσης που βρέθηκαν και ακολούθησε εκτενής αναφορά επιλεκτικά σε κάποια από αυτά. Κριτήριο επιλογής αυτών των εργαλείων αποτέλεσε η ημερομηνία δημοσίευσής τους, οι αλγόριθμοι και η μεθοδολογία που χρησιμοποιούν. Τα εργαλεία αυτά κατηγοριοποιήθηκαν με βάση τις λέξεις κλειδιά που χρησιμοποιήθηκαν για την εξόρυξη των δεδομένων από την βιβλιογραφία. Τέλος πραγματοποιήθηκε συγκριτική μελέτη αυτών αναδεικνύοντας τα πλεονεκτήματα και εστιάζοντας στην περαιτέρω αξιοποίησή τους.The aim of this thesis was to report on the current state-of-the-art of the research conducted concerning mapping of protein cavities with a potential function role as binding sites of bioactive compounds, peptides or other proteins. A literature review was performed with emphasis on the relevant tools developed during the last decade. In addition, the main research findings regarding drug design and druggable targets based on binding sites are presented. Processes performed in protein cavity detection and analysis, of previous research articles, are compared with the approach described by Anaxagoras Fotopoulos and Athanasios Papathanasiou (2015). The results showed that a competitive advantage of their approach is the multidimensional k-means algorithm for clustering. For the bibliographic review the scientific knowledgebase has been used, which includes international articles and journals, book chapters, as well as online articles regarding drug design and protein cavity. Search keywords such as protein cavity dynamics, catalytic sites of enzymes, protein pocket etc. were used to identify bioinformatics tools with text mining. A catalogue of the most recently developed tools is presented followed by a brief description of selected tools. The selection criteria imposed for preparing the catalogue and the detailed description included the publication date, as well as the algorithms and the methods they use. The tools were then classified according to the search keywords. The findings of this research are discussed, and the algorithms and methods they use are compared, highlighting the advantages of protein cavity detection

    Protein Binding Ligand Prediction Using Moments-Based Methods

    Get PDF
    Abstract Structural genomics initiatives have started to accumulate protein structures of unknown function in an increasing pace. Conventional sequence-based function prediction methods are not able to provide useful function information to most of such structures. Thus, structure-based approaches have been developed, which predict function of proteins by capturing structural characteristics of functional sites. Particularly, several approaches have been proposed to identify potential ligand binding sites in a query protein structure and to compare them with known ligand binding sites. In this chapter, we introduce computational methods for describing and comparing ligand binding sites using two dimensional and three dimensional moments. An advantage of moment-based methods is that the tertiary structure of pocket shapes is described compactly as a vector of coefficients of series expansion. Thus a search against an entire PDB-scale database can be performed in real-time. We evaluate two binding pocket representations, one based on two-dimensional pseudo-Zernike moments and the other based on threedimensional Zernike moments. A new development of pocket comparison method is also mentioned, which allows partial matching of pockets by using local patch descriptors

    Development and Improvement of Tools and Algorithms for the Problem of Atom Type Perception and for the Assessment of Protein-Ligand-Complex Geometries

    Get PDF
    In context of the present work, a scoring function for protein-ligand complexes has been developed, not aimed at affinity prediction, but rather a good recognition rate of near native geometries. The developed program DSX makes use of the same formalism as the knowledge-based scoring function DrugScore, hence using the knowledge from crystallographic databases and atom-type specific distance-dependent distribution functions. It is based on newly defined atom-types. Additionally, the program is augmented by two novel potentials which evaluate the torsion angles and (de-)solvation effects. Validation of DSX is based on a literature-known, comprehensive data-set that allows for comparison with other popular scoring functions. DSX is intended for the recognition of near-native binding modes. In this important task, DSX outperforms the competitors, but is also among the best scoring functions regarding the ranking of different compounds. Another essential step in the development of DSX was the automatical assignment of the new atom types. A powerful programming framework was implemented to fulfill this task. Validation was done on a literature-known data-set and showed superior efficiency and quality compared to similar programs where this data was available. The front-end fconv was developed to share this functionality with the scientific community. Multiple features useful in computational drug-design workflows are also included and fconv was made freely available as Open Source Project. Based on the developed potentials for DSX, a number of further applications was created and impemented: The program HotspotsX calculates favorable interaction fields in protein binding pockets that can be used as a starting point for pharmacophoric models and that indicate possible directions for the optimization of lead structures. The program DSFP calculates scores based on fingerprints for given binding geometries. These fingerprints are compared with reference fingerprints that are derived from DSX interactions in known crystal structures of the particular target. Finally, the program DSX_wat was developed to predict stable water networks within a binding pocket. DSX interaction fields are used to calculate the putative water positions

    Analysis of shape, properties and "druggability" of protein binding pockets

    Get PDF
    Kenntnisse über die dreidimensionale Struktur therapeutisch relevanter Zielproteine bieten wertvolle Informationen für den rationalen Wirkstoffentwurf. Die stetig wachsende Zahl aufgeklärter Kristallstrukturen von Proteinen ermöglicht eine qualitative und quantitative rechnergestützte Untersuchung von spezifischen Protein-Liganden Wechselwirkungen. Im Rahmen dieser Arbeit wurden neue Algorithmen für die Identifikation und den Ähnlichkeitsvergleich von Proteinbindetaschen und ihren Eigenschaften entwickelt und in dem Programm PocketomePicker zusammengefasst. Die Software gliedert sich in die Routinen PocketPicker, PocketShapelets und PocketGraph. Ferner wurde in dieser Arbeit die Methode ReverseLIQUID reimplementiert und im Rahmen einer Kooperation für das strukturbasierte Virtuelle Screening angewendet. Die genannten Methoden und ihre wissenschaftliche Anwendungen sollte hier zusammengefasst werden: Die Methode PocketPicker ermöglicht die Vorhersage potentieller Bindetaschen auf Proteinoberflächen. Diese Technik implementiert einen geometrischen Ansatz auf Basis „künstlicher Gitter“ zur Identifikation zusammenhängender vergrabener Bereiche der Proteinoberfläche als Orte möglicher Ligandenbindestellen. Die Methode erreicht eine korrekte Vorhersage der tatsächlichen Bindetasche für 73 % der Einträge eines repräsentativen Datensatzes von Proteinstrukturen. Für 90 % der Proteinstrukturen wird die tatsächlich Ligandenbindestelle unter den drei wahrscheinlichsten vorhergesagten Taschen gefunden. PocketPicker übertrifft die Vorhersagequalität anderer etablierter Algorithmen und ermöglicht Taschenidentifikationen auf apo-Strukturen ohne signifikante Einbußen des Vorhersageerfolges. Andere Verfahren weisen deutlich eingeschränkte Ergebnisse bei der Anwendung auf apo-Strukturen auf. PocketPicker erlaubt den alignmentfreien Ähnlichkeitsvergleich von Bindetaschenfor-men durch die Kodierung berechneter Bindevolumen als Korrelationsdeskriptoren. Dieser Ansatz wurde erfolgreich für Funktionsvorhersage von Bindetaschen aus Homologiemodellen von APOBEC3C und Glutamat Dehydrogenase des Malariaerregers Plasmodium falciparum angewendet. Diese beiden Projekte wurden in Zusammenarbeit mit Kollaborationspartnern durchgeführt. Zudem wurden PocketPicker Korrelationsdeskriptoren erfolgreich für die automatisierte Konformationsanalyse der enzymatischen Tasche von Aldose Reduktase angewendet. Für detaillierte Analysen der Form und der physikochemischen Eigenschaften von Proteinbindetaschen wurde in dieser Arbeit die Methode PocketShapelets entwickelt. Diese Technik ermöglicht strukturelle Alignments von extrahierten Bindevolumen durch Zerlegungen der Oberfläche von Proteinbindetaschen. Die Überlagerung gelingt durch die Identifikation strukturell ähnlicher Oberflächenkurvaturen zweier Taschen. PocketShapelets wurde erfolgreich zur Analyse funktioneller Ähnlichkeit von Bindetaschen verwendet, die auf Betrachtungen physikochemischer Eigenschaften basiert. Zur Analyse der topologischen Vielfalt von Bindetaschengeometrien wurde in dieser Arbeit die Methode PocketGraph entwickelt. Dieser Ansatz nutzt das Konzept des sog. „Wachsenden Neuronalen Gases“ aus dem Bereich des maschinellen Lernens für eine automatische Extraktion des strukturellen Aufbaus von Bindetaschen. Ferner ermöglicht diese Methode die Zerlegung einer Bindestelle in ihre Subtaschen. Die von PocketPicker charakterisierten Taschenvolumen bilden die Grundlage für die Methode ReverseLIQUID. Dieses Programm wurde in dieser Arbeit weiterentwickelt und im Rahmen einer Kooperation zur Identifikation eines Inhibitors der Serinprotease HtrA des Erregers Helicobacter pylori verwendet. Mit ReverseLIQUID konnte ein strukturbasiertes Pharmakophormodell für das Virtuelle Screening erstellt werden. Dieser Ansatz ermöglichte die Identifikation einer Substanz mit niedrig mikromolarer Affinität gegenüber der Zielstruktur.Knowledge of the three-dimensional structure therapeutically relevant target proteins provides valuable information for rational drug design. The constantly increasing numbers of available crystal structures enable qualitative and quantitative analysis of specific protein-ligand interactions in silico. In this work novel algorithms for the identification and the comparison of protein binding sites and their properties were developed and combined in the program PocketomePicker. The software combines the routines PocketPicker, PocketShapelets and PocketGraph. Furthermore, the method ReverseLIQUID was re-implemented in this work and used for the structure-based virtual screening with a cooperation partner. The programs and their scientific applications are summarized here: The method PocketPicker is designed for the prediction of potential binding sites on protein surfaces. The technique implements a geometric approach based on the concept of “artificial grids” for the identification of continuous buried regions of the protein surface that might act as potential ligand binding sites. The method yields correct predications of the actual binding site for 73 % of the entries in a representative data set of protein structures. For 90 % of the proteins the actual binding site is found among the top three predicted binding pockets. PocketPicker exceeds the predictive quality of other established algorithms and enables correct binding site identifications on apo structures without significant drops of the prediction success. This is not achieved by other programs. PocketPicker enables alignment-free comparisons of binding site shapes by encoding extracted binding volumes as correlation vectors. This approach was used for successful predictions of binding site functionality for homology models of APOBEC3C and glutamate dehydrogenase of the malaria pathogen Plasmodium falciparum. These projects were carried out with collaboration partners. Furthermore, PocketPicker correlation descriptors were used for automated analysis of binding site conformations of aldose reductase active sites. The method PocketShapelets was implemented in this work for detailed analysis of shapes and physicochemical properties of protein binding sites. This approach enables structural alignments of extracted binding volumes by surface decomposition of protein binding sites. The structural superposition is achieved by identification of structurally similar surface curvatures of different binding pockets. PocketShapelets was successfully used for the analysis of functional similarity of binding sites based on observations of physicochemical properties. PocketGraph was developed for the analysis of the structural diversity of binding site geometries. This approach uses the “Growing Neural Gas” concept used in machine learning for an automated extraction of the structural organization of binding sites. Furthermore, the method enables the decomposition of binding sites into subpockets. The pocket volumes characterized by PocketPicker are the foundation of another program called ReverseLIQUID. This method was refined in this work and used for the identification of a Helicobacter pylori serine protease HtrA inhibitor. This project was performed with a collaboration partner. A receptor-based pharmacophore model was derived using ReverseLIQUID and used for virtual screening. This approach led to the identification of a substance with low micromolar affinity towards the target protein

    Computations to Obtain Wider Tunnels in Protein Structures

    Get PDF
    Finding wide tunnels in protein structures is an important problem in Structural Bioinformatics with applications in various areas such as drug design. Several algorithms have been proposed for finding wide tunnels in a fixed protein conformation. However, to the best of our knowledge, none of the existing work have considered widening the tunnel, i.e., finding a wider tunnel in an alternative conformation of the given structure. In this thesis we initiate this line of research by proposing a tunnel-widening algorithm which aims to make the tunnel wider by a slight local change in the structure of the protein. Given a fixed conformation of a protein with a point located inside it, we first describe an algorithm to identify the widest tunnel from that point to the outside environment of the protein. Then we try to make the tunnel wider by considering various alternative conformations of the protein. We only consider conformations whose energies are not much higher than the energy of the initial conformation. Among these alternative conformations we select the one with the widest tunnel. However, the alternative conformation with the widest tunnel might not be accessible from the initial structure. Thus, in the next step we develop three algorithms for finding a feasible transition pathway from the initial structure to the alternative conformation, i.e., a sequence of intermediate conformations between the initial structure and the alternative conformation such that the energy values of all these intermediate conformations are close to the energy of the initial structure. We evaluate our tunnel-finding and tunnel-widening algorithms on various proteins. Our experiments show that in most cases we can make the tunnel wider in an alternative conformation. However, there are cases in which we find a wider tunnel in an alternative conformation, but the energy value of the alternative conformation is much higher than the energy of the initial structure. We also implemented our three pathway-finding algorithms and tested them on various instances. Our experiments show that although in most cases we can find a feasible transition pathway, there are cases in which the alternative conformation has energy close to the initial structure, but our algorithms cannot find any feasible pathway from the initial structure to the alternative conformation. Furthermore, there is a trade-off between the running time and accuracy of the three pathway-finding algorithms
    corecore