120 research outputs found

    PocketPicker: analysis of ligand binding-sites with shape descriptors

    Get PDF
    Background Identification and evaluation of surface binding-pockets and occluded cavities are initial steps in protein structure-based drug design. Characterizing the active site's shape as well as the distribution of surrounding residues plays an important role for a variety of applications such as automated ligand docking or in situ modeling. Comparing the shape similarity of binding site geometries of related proteins provides further insights into the mechanisms of ligand binding. Results We present PocketPicker, an automated grid-based technique for the prediction of protein binding pockets that specifies the shape of a potential binding-site with regard to its buriedness. The method was applied to a representative set of protein-ligand complexes and their corresponding apo-protein structures to evaluate the quality of binding-site predictions. The performance of the pocket detection routine was compared to results achieved with the existing methods CAST, LIGSITE, LIGSITEcs, PASS and SURFNET. Success rates PocketPicker were comparable to those of LIGSITEcs and outperformed the other tools. We introduce a descriptor that translates the arrangement of grid points delineating a detected binding-site into a correlation vector. We show that this shape descriptor is suited for comparative analyses of similar binding-site geometry by examining induced-fit phenomena in aldose reductase. This new method uses information derived from calculations of the buriedness of potential binding-sites. Conclusions The pocket prediction routine of PocketPicker is a useful tool for identification of potential protein binding-pockets. It produces a convenient representation of binding-site shapes including an intuitive description of their accessibility. The shape-descriptor for automated classification of binding-site geometries can be used as an additional tool complementing elaborate manual inspections

    Fpocket: An open source platform for ligand pocket detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Virtual screening methods start to be well established as effective approaches to identify hits, candidates and leads for drug discovery research. Among those, structure based virtual screening (SBVS) approaches aim at docking collections of small compounds in the target structure to identify potent compounds. For SBVS, the identification of candidate pockets in protein structures is a key feature, and the recent years have seen increasing interest in developing methods for pocket and cavity detection on protein surfaces.</p> <p>Results</p> <p>Fpocket is an open source pocket detection package based on Voronoi tessellation and alpha spheres built on top of the publicly available package Qhull. The modular source code is organised around a central library of functions, a basis for three main programs: (i) Fpocket, to perform pocket identification, (ii) Tpocket, to organise pocket detection benchmarking on a set of known protein-ligand complexes, and (iii) Dpocket, to collect pocket descriptor values on a set of proteins. Fpocket is written in the C programming language, which makes it a platform well suited for the scientific community willing to develop new scoring functions and extract various pocket descriptors on a large scale level. Fpocket 1.0, relying on a simple scoring function, is able to detect 94% and 92% of the pockets within the best three ranked pockets from the holo and apo proteins respectively, outperforming the standards of the field, while being faster.</p> <p>Conclusion</p> <p>Fpocket provides a rapid, open source and stable basis for further developments related to protein pocket detection, efficient pocket descriptor extraction, or drugablity prediction purposes. Fpocket is freely available under the GNU GPL license at <url>http://fpocket.sourceforge.net</url>.</p

    Geometric algorithms for cavity detection on protein surfaces

    Get PDF
    Macromolecular structures such as proteins heavily empower cellular processes or functions. These biological functions result from interactions between proteins and peptides, catalytic substrates, nucleotides or even human-made chemicals. Thus, several interactions can be distinguished: protein-ligand, protein-protein, protein-DNA, and so on. Furthermore, those interactions only happen under chemical- and shapecomplementarity conditions, and usually take place in regions known as binding sites. Typically, a protein consists of four structural levels. The primary structure of a protein is made up of its amino acid sequences (or chains). Its secondary structure essentially comprises -helices and -sheets, which are sub-sequences (or sub-domains) of amino acids of the primary structure. Its tertiary structure results from the composition of sub-domains into domains, which represent the geometric shape of the protein. Finally, the quaternary structure of a protein results from the aggregate of two or more tertiary structures, usually known as a protein complex. This thesis fits in the scope of structure-based drug design and protein docking. Specifically, one addresses the fundamental problem of detecting and identifying protein cavities, which are often seen as tentative binding sites for ligands in protein-ligand interactions. In general, cavity prediction algorithms split into three main categories: energy-based, geometry-based, and evolution-based. Evolutionary methods build upon evolutionary sequence conservation estimates; that is, these methods allow us to detect functional sites through the computation of the evolutionary conservation of the positions of amino acids in proteins. Energy-based methods build upon the computation of interaction energies between protein and ligand atoms. In turn, geometry-based algorithms build upon the analysis of the geometric shape of the protein (i.e., its tertiary structure) to identify cavities. This thesis focuses on geometric methods. We introduce here three new geometric-based algorithms for protein cavity detection. The main contribution of this thesis lies in the use of computer graphics techniques in the analysis and recognition of cavities in proteins, much in the spirit of molecular graphics and modeling. As seen further ahead, these techniques include field-of-view (FoV), voxel ray casting, back-face culling, shape diameter functions, Morse theory, and critical points. The leading idea is to come up with protein shape segmentation, much like we commonly do in mesh segmentation in computer graphics. In practice, protein cavity algorithms are nothing more than segmentation algorithms designed for proteins.Estruturas macromoleculares tais como as proteínas potencializam processos ou funções celulares. Estas funções resultam das interações entre proteínas e peptídeos, substratos catalíticos, nucleótideos, ou até mesmo substâncias químicas produzidas pelo homem. Assim, há vários tipos de interacções: proteína-ligante, proteína-proteína, proteína-DNA e assim por diante. Além disso, estas interações geralmente ocorrem em regiões conhecidas como locais de ligação (binding sites, do inglês) e só acontecem sob condições de complementaridade química e de forma. É também importante referir que uma proteína pode ser estruturada em quatro níveis. A estrutura primária que consiste em sequências de aminoácidos (ou cadeias), a estrutura secundária que compreende essencialmente por hélices e folhas , que são subsequências (ou subdomínios) dos aminoácidos da estrutura primária, a estrutura terciária que resulta da composição de subdomínios em domínios, que por sua vez representa a forma geométrica da proteína, e por fim a estrutura quaternária que é o resultado da agregação de duas ou mais estruturas terciárias. Este último nível estrutural é frequentemente conhecido por um complexo proteico. Esta tese enquadra-se no âmbito da conceção de fármacos baseados em estrutura e no acoplamento de proteínas. Mais especificamente, aborda-se o problema fundamental da deteção e identificação de cavidades que são frequentemente vistos como possíveis locais de ligação (putative binding sites, do inglês) para os seus ligantes (ligands, do inglês). De forma geral, os algoritmos de identificação de cavidades dividem-se em três categorias principais: baseados em energia, geometria ou evolução. Os métodos evolutivos baseiam-se em estimativas de conservação das sequências evolucionárias. Isto é, estes métodos permitem detectar locais funcionais através do cálculo da conservação evolutiva das posições dos aminoácidos das proteínas. Em relação aos métodos baseados em energia estes baseiam-se no cálculo das energias de interação entre átomos da proteína e do ligante. Por fim, os algoritmos geométricos baseiam-se na análise da forma geométrica da proteína para identificar cavidades. Esta tese foca-se nos métodos geométricos. Apresentamos nesta tese três novos algoritmos geométricos para detecção de cavidades em proteínas. A principal contribuição desta tese está no uso de técnicas de computação gráfica na análise e reconhecimento de cavidades em proteínas, muito no espírito da modelação e visualização molecular. Como pode ser visto mais à frente, estas técnicas incluem o field-of-view (FoV), voxel ray casting, back-face culling, funções de diâmetro de forma, a teoria de Morse, e os pontos críticos. A ideia principal é segmentar a proteína, à semelhança do que acontece na segmentação de malhas em computação gráfica. Na prática, os algoritmos de detecção de cavidades não são nada mais que algoritmos de segmentação de proteínas

    AtomLbs: An Atom Based Convolutional Neural Network for Druggable Ligand Binding Site Prediction

    Get PDF
    Despite advances in drug research and development, there are few and ineffective treatments for a variety of diseases. Virtual screening can drastically reduce costs and accelerate the drug discovery process. Binding site identification is one of the initial and most important steps in structure-based virtual screening. Identifying and defining protein cavities that are likely to bind to a small compound is the objective of this task. In this research, we propose four different convolutional neural networks for predicting ligand-binding sites in proteins. A parallel optimized data pipeline is created to enable faster training of these neural network models on minimal hardware. The effectiveness of each method is assessed on well-established ligand binding site datasets. It is then compared with the state-of-the-art and widely used methods for ligand binding site identification. The result shows that our methods outperform most of the other methods and are comparable to the state-of-the-art methods

    A Comprehensive Mapping of the Druggable Cavities within the SARS-CoV-2 Therapeutically Relevant Proteins by Combining Pocket and Docking Searches as Implemented in Pockets 2.0

    Get PDF
    (1) Background: Virtual screening studies on the therapeutically relevant proteins of the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) require a detailed characterization of their druggable binding sites, and, more generally, a convenient pocket mapping represents a key step for structure-based in silico studies; (2) Methods: Along with a careful literature search on SARS-CoV-2 protein targets, the study presents a novel strategy for pocket mapping based on the combination of pocket (as performed by the well-known FPocket tool) and docking searches (as performed by PLANTS or AutoDock/Vina engines); such an approach is implemented by the Pockets 2.0 plug-in for the VEGA ZZ suite of programs; (3) Results: The literature analysis allowed the identification of 16 promising binding cavities within the SARS-CoV-2 proteins and the here proposed approach was able to recognize them showing performances clearly better than those reached by the sole pocket detection; and (4) Conclusions: Even though the presented strategy should require more extended validations, this proved successful in precisely characterizing a set of SARS-CoV-2 druggable binding pockets including both orthosteric and allosteric sites, which are clearly amenable for virtual screening campaigns and drug repurposing studies. All results generated by the study and the Pockets 2.0 plug-in are available for download

    Using molecular dynamics and enhanced sampling techniques to find cryptic druggable pockets in proteins of pharmaceutical interest

    Get PDF
    Cryptic pockets are sites on protein targets that are hidden in the unliganded form and only become apparent when drugs bind. These sites provide a promising alternative to classical substrate binding sites for drug development, especially when the latter are not druggable. In this thesis I investigate the nature and dynamical properties of cryptic sites in a number of pharmacologically relevant targets, while comparing the efficacy of various simulation-based approaches in discovering them. I found that the studied cryptic sites do not correspond to local minima in the computed conformational free-energy landscape of the unliganded proteins. They thus promptly close in all of the molecular dynamics simulations performed, irrespective of the force-field used. Temperature-based enhanced sampling approaches, such as parallel tempering, do not improve the situation, as the entropic term does not help in the opening of the sites. The use of fragment probes helps, as in long simulations occasionally it leads to the opening and binding to the cryptic sites. The observed mechanism of cryptic site formation is suggestive of interplay between two classical mechanisms: induced-fit and conformational selection. Employing this insight, I developed a novel Hamiltonian replica exchange-based method SWISH (sampling water interfaces through scaled Hamiltonians), which combined with probes resulted in a promising general approach for cryptic site discovery. In addition, we revisit the rather ill-defined concept of the cryptic pockets in order to propose an alternative measurable interpretation. I outline how the new practical definition can be applied to the ligandable targets reported in the PDB, in order to provide a consistent data-driven view on crypticity and how it may impact the drug discovery. This thesis presents a comprehensive study of the cryptic pocket phenomenon: from understanding the nature of their formation to novel detection methodology, and towards understanding their global significance in drug discovery

    Shortest Geometric Paths Analysis in Structural Biology

    Get PDF
    The surface of a macromolecule, such as a protein, represents the contact point of any interaction that molecule has with solvent, ions, small molecules or other macromolecules. Analyzing the surface of macromolecules has a rich history but analyzing the distances from this surface to other surfaces or volumes has not been extensively explored. Many important questions can be answered quantitatively through these analyses. These include: what is the depth of a pocket or groove on the surface? what is the overall depth of the protein? how deeply are atoms buried from the surface? where are the tunnels in a protein? where are the pockets and what are their shapes? A single algorithm to solve one graph problem, namely Dijkstra’s shortest paths algorithm, forms the basis for algorithms to answer these many questions. Many distances can be measured, for instance the distance from the convex hull to the molecular surface while avoiding the interior of the surface is defined as Travel Depth. Alternatively, the distance from the surface to every atom can be measured, giving a measure of the Burial Depth of given residues. Measuring the minimum distance to the protein surface for all points in solvent, combined with topological guidance, allows tunnels to be located. Analyzing the surface from the deepest Travel Depth upwards allows pockets to be catalogued over the entire protein surface for additional shape analysis. Ligand binding sites in proteins are significantly deep, though this does not affect the binding affinity. Hyperthermostable proteins have a less deep surface but bury atoms more deeply, forming more spherical shapes than their mesophilic counterparts. Tunnels through proteins can be identified, for the first time tunnels that are winding or bifurcated can be analyzed. Pockets can be found all over the protein surface and these pockets can be tracked through time series, mutational series, or over protein families. All of these results are new and for the first time provide quantitative and statistical verification of some previous hypotheses about protein shape

    Analysis of shape, properties and "druggability" of protein binding pockets

    Get PDF
    Kenntnisse über die dreidimensionale Struktur therapeutisch relevanter Zielproteine bieten wertvolle Informationen für den rationalen Wirkstoffentwurf. Die stetig wachsende Zahl aufgeklärter Kristallstrukturen von Proteinen ermöglicht eine qualitative und quantitative rechnergestützte Untersuchung von spezifischen Protein-Liganden Wechselwirkungen. Im Rahmen dieser Arbeit wurden neue Algorithmen für die Identifikation und den Ähnlichkeitsvergleich von Proteinbindetaschen und ihren Eigenschaften entwickelt und in dem Programm PocketomePicker zusammengefasst. Die Software gliedert sich in die Routinen PocketPicker, PocketShapelets und PocketGraph. Ferner wurde in dieser Arbeit die Methode ReverseLIQUID reimplementiert und im Rahmen einer Kooperation für das strukturbasierte Virtuelle Screening angewendet. Die genannten Methoden und ihre wissenschaftliche Anwendungen sollte hier zusammengefasst werden: Die Methode PocketPicker ermöglicht die Vorhersage potentieller Bindetaschen auf Proteinoberflächen. Diese Technik implementiert einen geometrischen Ansatz auf Basis „künstlicher Gitter“ zur Identifikation zusammenhängender vergrabener Bereiche der Proteinoberfläche als Orte möglicher Ligandenbindestellen. Die Methode erreicht eine korrekte Vorhersage der tatsächlichen Bindetasche für 73 % der Einträge eines repräsentativen Datensatzes von Proteinstrukturen. Für 90 % der Proteinstrukturen wird die tatsächlich Ligandenbindestelle unter den drei wahrscheinlichsten vorhergesagten Taschen gefunden. PocketPicker übertrifft die Vorhersagequalität anderer etablierter Algorithmen und ermöglicht Taschenidentifikationen auf apo-Strukturen ohne signifikante Einbußen des Vorhersageerfolges. Andere Verfahren weisen deutlich eingeschränkte Ergebnisse bei der Anwendung auf apo-Strukturen auf. PocketPicker erlaubt den alignmentfreien Ähnlichkeitsvergleich von Bindetaschenfor-men durch die Kodierung berechneter Bindevolumen als Korrelationsdeskriptoren. Dieser Ansatz wurde erfolgreich für Funktionsvorhersage von Bindetaschen aus Homologiemodellen von APOBEC3C und Glutamat Dehydrogenase des Malariaerregers Plasmodium falciparum angewendet. Diese beiden Projekte wurden in Zusammenarbeit mit Kollaborationspartnern durchgeführt. Zudem wurden PocketPicker Korrelationsdeskriptoren erfolgreich für die automatisierte Konformationsanalyse der enzymatischen Tasche von Aldose Reduktase angewendet. Für detaillierte Analysen der Form und der physikochemischen Eigenschaften von Proteinbindetaschen wurde in dieser Arbeit die Methode PocketShapelets entwickelt. Diese Technik ermöglicht strukturelle Alignments von extrahierten Bindevolumen durch Zerlegungen der Oberfläche von Proteinbindetaschen. Die Überlagerung gelingt durch die Identifikation strukturell ähnlicher Oberflächenkurvaturen zweier Taschen. PocketShapelets wurde erfolgreich zur Analyse funktioneller Ähnlichkeit von Bindetaschen verwendet, die auf Betrachtungen physikochemischer Eigenschaften basiert. Zur Analyse der topologischen Vielfalt von Bindetaschengeometrien wurde in dieser Arbeit die Methode PocketGraph entwickelt. Dieser Ansatz nutzt das Konzept des sog. „Wachsenden Neuronalen Gases“ aus dem Bereich des maschinellen Lernens für eine automatische Extraktion des strukturellen Aufbaus von Bindetaschen. Ferner ermöglicht diese Methode die Zerlegung einer Bindestelle in ihre Subtaschen. Die von PocketPicker charakterisierten Taschenvolumen bilden die Grundlage für die Methode ReverseLIQUID. Dieses Programm wurde in dieser Arbeit weiterentwickelt und im Rahmen einer Kooperation zur Identifikation eines Inhibitors der Serinprotease HtrA des Erregers Helicobacter pylori verwendet. Mit ReverseLIQUID konnte ein strukturbasiertes Pharmakophormodell für das Virtuelle Screening erstellt werden. Dieser Ansatz ermöglichte die Identifikation einer Substanz mit niedrig mikromolarer Affinität gegenüber der Zielstruktur.Knowledge of the three-dimensional structure therapeutically relevant target proteins provides valuable information for rational drug design. The constantly increasing numbers of available crystal structures enable qualitative and quantitative analysis of specific protein-ligand interactions in silico. In this work novel algorithms for the identification and the comparison of protein binding sites and their properties were developed and combined in the program PocketomePicker. The software combines the routines PocketPicker, PocketShapelets and PocketGraph. Furthermore, the method ReverseLIQUID was re-implemented in this work and used for the structure-based virtual screening with a cooperation partner. The programs and their scientific applications are summarized here: The method PocketPicker is designed for the prediction of potential binding sites on protein surfaces. The technique implements a geometric approach based on the concept of “artificial grids” for the identification of continuous buried regions of the protein surface that might act as potential ligand binding sites. The method yields correct predications of the actual binding site for 73 % of the entries in a representative data set of protein structures. For 90 % of the proteins the actual binding site is found among the top three predicted binding pockets. PocketPicker exceeds the predictive quality of other established algorithms and enables correct binding site identifications on apo structures without significant drops of the prediction success. This is not achieved by other programs. PocketPicker enables alignment-free comparisons of binding site shapes by encoding extracted binding volumes as correlation vectors. This approach was used for successful predictions of binding site functionality for homology models of APOBEC3C and glutamate dehydrogenase of the malaria pathogen Plasmodium falciparum. These projects were carried out with collaboration partners. Furthermore, PocketPicker correlation descriptors were used for automated analysis of binding site conformations of aldose reductase active sites. The method PocketShapelets was implemented in this work for detailed analysis of shapes and physicochemical properties of protein binding sites. This approach enables structural alignments of extracted binding volumes by surface decomposition of protein binding sites. The structural superposition is achieved by identification of structurally similar surface curvatures of different binding pockets. PocketShapelets was successfully used for the analysis of functional similarity of binding sites based on observations of physicochemical properties. PocketGraph was developed for the analysis of the structural diversity of binding site geometries. This approach uses the “Growing Neural Gas” concept used in machine learning for an automated extraction of the structural organization of binding sites. Furthermore, the method enables the decomposition of binding sites into subpockets. The pocket volumes characterized by PocketPicker are the foundation of another program called ReverseLIQUID. This method was refined in this work and used for the identification of a Helicobacter pylori serine protease HtrA inhibitor. This project was performed with a collaboration partner. A receptor-based pharmacophore model was derived using ReverseLIQUID and used for virtual screening. This approach led to the identification of a substance with low micromolar affinity towards the target protein
    • …
    corecore