5 research outputs found
MS3ALIGN: an efficient molecular surface aligner using the topology of surface curvature
Background: Aligning similar molecular structures is an important step in the process of bio-molecular structure and function analysis. Molecular surfaces are simple representations of molecular structure that are easily constructed from various forms of molecular data such as 3D atomic coordinates (PDB) and Electron Microscopy (EM) data. Methods: We present a Multi-Scale Morse-Smale Molecular-Surface Alignment tool, MS3ALIGN, which aligns molecular surfaces based on significant protrusions on the molecular surface. The input is a pair of molecular surfaces represented as triangle meshes. A key advantage of MS3ALIGN is computational efficiency that is achieved because it processes only a few carefully chosen protrusions on the molecular surface. Furthermore, the alignments are partial in nature and therefore allows for inexact surfaces to be aligned. Results: The method is evaluated in four settings. First, we establish performance using known alignments with varying overlap and noise values. Second, we compare the method with SurfComp, an existing surface alignment method. We show that we are able to determine alignments reported by SurfComp, as well as report relevant alignments not found by SurfComp. Third, we validate the ability of MS3ALIGN to determine alignments in the case of structurally dissimilar binding sites. Fourth, we demonstrate the ability of MS3ALIGN to align iso-surfaces derived from cryo-electron microscopy scans. Conclusions: We have presented an algorithm that aligns Molecular Surfaces based on the topology of surface curvature
Geometric algorithms for cavity detection on protein surfaces
Macromolecular structures such as proteins heavily empower cellular processes or functions.
These biological functions result from interactions between proteins and peptides,
catalytic substrates, nucleotides or even human-made chemicals. Thus, several
interactions can be distinguished: protein-ligand, protein-protein, protein-DNA,
and so on. Furthermore, those interactions only happen under chemical- and shapecomplementarity
conditions, and usually take place in regions known as binding sites.
Typically, a protein consists of four structural levels. The primary structure of a protein
is made up of its amino acid sequences (or chains). Its secondary structure essentially
comprises -helices and -sheets, which are sub-sequences (or sub-domains) of amino
acids of the primary structure. Its tertiary structure results from the composition of
sub-domains into domains, which represent the geometric shape of the protein. Finally,
the quaternary structure of a protein results from the aggregate of two or more
tertiary structures, usually known as a protein complex.
This thesis fits in the scope of structure-based drug design and protein docking. Specifically,
one addresses the fundamental problem of detecting and identifying protein
cavities, which are often seen as tentative binding sites for ligands in protein-ligand
interactions. In general, cavity prediction algorithms split into three main categories:
energy-based, geometry-based, and evolution-based. Evolutionary methods build upon
evolutionary sequence conservation estimates; that is, these methods allow us to detect
functional sites through the computation of the evolutionary conservation of the
positions of amino acids in proteins. Energy-based methods build upon the computation
of interaction energies between protein and ligand atoms. In turn, geometry-based algorithms
build upon the analysis of the geometric shape of the protein (i.e., its tertiary
structure) to identify cavities. This thesis focuses on geometric methods.
We introduce here three new geometric-based algorithms for protein cavity detection.
The main contribution of this thesis lies in the use of computer graphics techniques
in the analysis and recognition of cavities in proteins, much in the spirit of molecular
graphics and modeling. As seen further ahead, these techniques include field-of-view
(FoV), voxel ray casting, back-face culling, shape diameter functions, Morse theory,
and critical points. The leading idea is to come up with protein shape segmentation,
much like we commonly do in mesh segmentation in computer graphics. In practice,
protein cavity algorithms are nothing more than segmentation algorithms designed for
proteins.Estruturas macromoleculares tais como as proteĂnas potencializam processos ou funções
celulares. Estas funções resultam das interações entre proteĂnas e peptĂdeos, substratos
catalĂticos, nucleĂłtideos, ou atĂ© mesmo substâncias quĂmicas produzidas pelo
homem. Assim, há vários tipos de interacções: proteĂna-ligante, proteĂna-proteĂna,
proteĂna-DNA e assim por diante. AlĂ©m disso, estas interações geralmente ocorrem em
regiões conhecidas como locais de ligação (binding sites, do inglês) e só acontecem sob
condições de complementaridade quĂmica e de forma. É tambĂ©m importante referir que
uma proteĂna pode ser estruturada em quatro nĂveis. A estrutura primária que consiste
em sequências de aminoácidos (ou cadeias), a estrutura secundária que compreende
essencialmente por hĂ©lices e folhas , que sĂŁo subsequĂŞncias (ou subdomĂnios) dos
aminoácidos da estrutura primária, a estrutura terciária que resulta da composição de
subdomĂnios em domĂnios, que por sua vez representa a forma geomĂ©trica da proteĂna,
e por fim a estrutura quaternária que é o resultado da agregação de duas ou mais estruturas
terciárias. Este Ăşltimo nĂvel estrutural Ă© frequentemente conhecido por um
complexo proteico.
Esta tese enquadra-se no âmbito da conceção de fármacos baseados em estrutura e no
acoplamento de proteĂnas. Mais especificamente, aborda-se o problema fundamental
da deteção e identificação de cavidades que sĂŁo frequentemente vistos como possĂveis
locais de ligação (putative binding sites, do inglês) para os seus ligantes (ligands, do
inglês). De forma geral, os algoritmos de identificação de cavidades dividem-se em três
categorias principais: baseados em energia, geometria ou evolução. Os métodos evolutivos
baseiam-se em estimativas de conservação das sequências evolucionárias. Isto é,
estes métodos permitem detectar locais funcionais através do cálculo da conservação
evolutiva das posições dos aminoácidos das proteĂnas. Em relação aos mĂ©todos baseados
em energia estes baseiam-se no cálculo das energias de interação entre átomos
da proteĂna e do ligante. Por fim, os algoritmos geomĂ©tricos baseiam-se na análise da
forma geomĂ©trica da proteĂna para identificar cavidades. Esta tese foca-se nos mĂ©todos
geométricos.
Apresentamos nesta tese três novos algoritmos geométricos para detecção de cavidades
em proteĂnas. A principal contribuição desta tese está no uso de tĂ©cnicas de computação
gráfica na análise e reconhecimento de cavidades em proteĂnas, muito no espĂrito da
modelação e visualização molecular. Como pode ser visto mais à frente, estas técnicas
incluem o field-of-view (FoV), voxel ray casting, back-face culling, funções de diâmetro
de forma, a teoria de Morse, e os pontos crĂticos. A ideia principal Ă© segmentar a
proteĂna, Ă semelhança do que acontece na segmentação de malhas em computação
gráfica. Na prática, os algoritmos de detecção de cavidades não são nada mais que
algoritmos de segmentação de proteĂnas
Analysis of shape, properties and "druggability" of protein binding pockets
Kenntnisse über die dreidimensionale Struktur therapeutisch relevanter Zielproteine bieten wertvolle Informationen für den rationalen Wirkstoffentwurf. Die stetig wachsende Zahl aufgeklärter Kristallstrukturen von Proteinen ermöglicht eine qualitative und quantitative rechnergestützte Untersuchung von spezifischen Protein-Liganden Wechselwirkungen. Im Rahmen dieser Arbeit wurden neue Algorithmen für die Identifikation und den Ähnlichkeitsvergleich von Proteinbindetaschen und ihren Eigenschaften entwickelt und in dem Programm PocketomePicker zusammengefasst. Die Software gliedert sich in die Routinen PocketPicker, PocketShapelets und PocketGraph. Ferner wurde in dieser Arbeit die Methode ReverseLIQUID reimplementiert und im Rahmen einer Kooperation für das strukturbasierte Virtuelle Screening angewendet. Die genannten Methoden und ihre wissenschaftliche Anwendungen sollte hier zusammengefasst werden: Die Methode PocketPicker ermöglicht die Vorhersage potentieller Bindetaschen auf Proteinoberflächen. Diese Technik implementiert einen geometrischen Ansatz auf Basis „künstlicher Gitter“ zur Identifikation zusammenhängender vergrabener Bereiche der Proteinoberfläche als Orte möglicher Ligandenbindestellen. Die Methode erreicht eine korrekte Vorhersage der tatsächlichen Bindetasche für 73 % der Einträge eines repräsentativen Datensatzes von Proteinstrukturen. Für 90 % der Proteinstrukturen wird die tatsächlich Ligandenbindestelle unter den drei wahrscheinlichsten vorhergesagten Taschen gefunden. PocketPicker übertrifft die Vorhersagequalität anderer etablierter Algorithmen und ermöglicht Taschenidentifikationen auf apo-Strukturen ohne signifikante Einbußen des Vorhersageerfolges. Andere Verfahren weisen deutlich eingeschränkte Ergebnisse bei der Anwendung auf apo-Strukturen auf. PocketPicker erlaubt den alignmentfreien Ähnlichkeitsvergleich von Bindetaschenfor-men durch die Kodierung berechneter Bindevolumen als Korrelationsdeskriptoren. Dieser Ansatz wurde erfolgreich für Funktionsvorhersage von Bindetaschen aus Homologiemodellen von APOBEC3C und Glutamat Dehydrogenase des Malariaerregers Plasmodium falciparum angewendet. Diese beiden Projekte wurden in Zusammenarbeit mit Kollaborationspartnern durchgeführt. Zudem wurden PocketPicker Korrelationsdeskriptoren erfolgreich für die automatisierte Konformationsanalyse der enzymatischen Tasche von Aldose Reduktase angewendet. Für detaillierte Analysen der Form und der physikochemischen Eigenschaften von Proteinbindetaschen wurde in dieser Arbeit die Methode PocketShapelets entwickelt. Diese Technik ermöglicht strukturelle Alignments von extrahierten Bindevolumen durch Zerlegungen der Oberfläche von Proteinbindetaschen. Die Überlagerung gelingt durch die Identifikation strukturell ähnlicher Oberflächenkurvaturen zweier Taschen. PocketShapelets wurde erfolgreich zur Analyse funktioneller Ähnlichkeit von Bindetaschen verwendet, die auf Betrachtungen physikochemischer Eigenschaften basiert. Zur Analyse der topologischen Vielfalt von Bindetaschengeometrien wurde in dieser Arbeit die Methode PocketGraph entwickelt. Dieser Ansatz nutzt das Konzept des sog. „Wachsenden Neuronalen Gases“ aus dem Bereich des maschinellen Lernens für eine automatische Extraktion des strukturellen Aufbaus von Bindetaschen. Ferner ermöglicht diese Methode die Zerlegung einer Bindestelle in ihre Subtaschen. Die von PocketPicker charakterisierten Taschenvolumen bilden die Grundlage für die Methode ReverseLIQUID. Dieses Programm wurde in dieser Arbeit weiterentwickelt und im Rahmen einer Kooperation zur Identifikation eines Inhibitors der Serinprotease HtrA des Erregers Helicobacter pylori verwendet. Mit ReverseLIQUID konnte ein strukturbasiertes Pharmakophormodell für das Virtuelle Screening erstellt werden. Dieser Ansatz ermöglichte die Identifikation einer Substanz mit niedrig mikromolarer Affinität gegenüber der Zielstruktur.Knowledge of the three-dimensional structure therapeutically relevant target proteins provides valuable information for rational drug design. The constantly increasing numbers of available crystal structures enable qualitative and quantitative analysis of specific protein-ligand interactions in silico. In this work novel algorithms for the identification and the comparison of protein binding sites and their properties were developed and combined in the program PocketomePicker. The software combines the routines PocketPicker, PocketShapelets and PocketGraph. Furthermore, the method ReverseLIQUID was re-implemented in this work and used for the structure-based virtual screening with a cooperation partner. The programs and their scientific applications are summarized here: The method PocketPicker is designed for the prediction of potential binding sites on protein surfaces. The technique implements a geometric approach based on the concept of “artificial grids” for the identification of continuous buried regions of the protein surface that might act as potential ligand binding sites. The method yields correct predications of the actual binding site for 73 % of the entries in a representative data set of protein structures. For 90 % of the proteins the actual binding site is found among the top three predicted binding pockets. PocketPicker exceeds the predictive quality of other established algorithms and enables correct binding site identifications on apo structures without significant drops of the prediction success. This is not achieved by other programs. PocketPicker enables alignment-free comparisons of binding site shapes by encoding extracted binding volumes as correlation vectors. This approach was used for successful predictions of binding site functionality for homology models of APOBEC3C and glutamate dehydrogenase of the malaria pathogen Plasmodium falciparum. These projects were carried out with collaboration partners. Furthermore, PocketPicker correlation descriptors were used for automated analysis of binding site conformations of aldose reductase active sites. The method PocketShapelets was implemented in this work for detailed analysis of shapes and physicochemical properties of protein binding sites. This approach enables structural alignments of extracted binding volumes by surface decomposition of protein binding sites. The structural superposition is achieved by identification of structurally similar surface curvatures of different binding pockets. PocketShapelets was successfully used for the analysis of functional similarity of binding sites based on observations of physicochemical properties. PocketGraph was developed for the analysis of the structural diversity of binding site geometries. This approach uses the “Growing Neural Gas” concept used in machine learning for an automated extraction of the structural organization of binding sites. Furthermore, the method enables the decomposition of binding sites into subpockets. The pocket volumes characterized by PocketPicker are the foundation of another program called ReverseLIQUID. This method was refined in this work and used for the identification of a Helicobacter pylori serine protease HtrA inhibitor. This project was performed with a collaboration partner. A receptor-based pharmacophore model was derived using ReverseLIQUID and used for virtual screening. This approach led to the identification of a substance with low micromolar affinity towards the target protein