113 research outputs found

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    Comparative investigation of eigenvalue-based molecular descriptors

    Get PDF
    Molekulski deskriptori su brojevi ili nizovi brojeva koji se koriste za kvantifikovanje molekulske strukture. Posebna klasa molekulskih deskriptora su grafovske invarijante. Poznate su i kao topoloski molekulski deskriptori. Izvo ˇ denje ovih deskriptora omoguceno je zamenom ´ molekula molekulskim grafom. Mnoge korisne matematicke veli ˇ cine mogu se izra ˇ cunati iz ˇ molekulskog grafa, npr. sopstvene vrednosti. Stoga je postalo moguce konstruisati molekulske ´ deskriptore koji se zasnivaju na sopstvenim vrednostima. Oni se nazivaju topoloski molekulski ˇ deskriptori zasnovani na sopstvenim vrednostima. Danas ih ima mnostvo. Samo nekoliko ˇ njih koristi sopstvene vrednosti dobijene iz klasicne matrice susedstva. Me ˇ du njima se isticuˇ energija grafa, Estradin indeks i rezolventna energija. U okviru ove doktorske disertacije izvrseno je uporedno ispitivanje ovih deskriptora. ˇ Prvi deo poglavlja Rezultati i diskusija izvestava o rezultatima u vezi sa istra ˇ zivanjem rela- ˇ cija izmedu energije grafa, Estradinog indeksa i rezolventne energije. Tri topoloska molekulska ˇ deskriptora zasnovana na sopstvenim vrednostima uporedena su pomocu nekoliko skupova ´ alkana i benzenoidnih ugljovodonika. Otkrivene su i diskutovane relacije medu njima. Identifikovani su strukturni parametri koji upravljaju ovim odnosima i dobijene su odgovarajuce´ formule zasnovane na visestrukoj linearnoj regresiji. Pokazalo se da sva tri istra ˇ zena indeksa ˇ kodiraju gotovo iste strukturne informacije o molekulu. Oni se razlikuju samo po stepenu osetljivosti na grananje molekula i po broju nevezivnih molekulskih orbitala. Dalja analiza energije grafa, Estradinog indeksa i rezolventne energije vezana je za degenerativnost ovih deskriptora. Da bi se testirao diskriminativni potencijal ovih deskriptora, korisˇceno je nekoliko klasa izomera hemijskih stabala. U ovim skupovima broj atoma ugljeni- ´ ka se kretao od 9 do 20. Kvantifikovanje degenerativnosti je uradeno pomocu dobro utvr ´ dene velicine. Rezultati pokazuju da energija grafa i Estradin indeks imaju sli ˇ can nivo degenera- ˇ tivnosti. Nagla promena degenerativnosti rezolventne energije u slucaju hemijskih stabala ˇ zahtevala je dodatno ispitivanje. Dobijeni rezultati su pokazali da postoji mnogo hemijskih stabala sa istom rezolventnom energijom. Ona se zovu 푟–ekvienergetska hemijska stabla. Zatim su predstavljeni podaci vezani za pretrazivanje rezolventnih ekvienergetskih hemijskih ˇ stabala. Treci deo poglavlja Rezultati i diskusija donosi rezultate o strukturnoj osetljivosti ener- ´ gije grafa, Estradinog indeksa i rezolventne energije na nekoliko serija katakondenzovanih i perikondenzovanih izomernih benzenoidnih ugljovodonika. Strukturna osetljivost je jedno od najvaznijih i najmanje istra ˇ zenih svojstava grafovskih invarijanti. Nedavno je predstavlje- ˇ na nova metoda za procenu strukturne osetljivosti topoloskih molekulskih deskriptora. Ovaj ˇ algoritam se sastoji od nekoliko razlicitih koraka. Zasnovan je na Tanimoto indeksu i Morga- ˇ novim kruznim fingerprintovima. Utvr ˇ deno je da energija grafa, Estradin indeks i rezolventna energija imaju slicnu strukturnu osetljivost na katakondenzovane izomere. Energija grafa je ˇ najosetljivija na male promene u perikondenzovanim benzenoidnim ugljovodonicima. Pored toga, osetljivost ovih deskriptora testirana je na katakondenzovanim izomerima sa razlicitim ˇ brojem zaliva, uvala i fjordova. Otkriveno je da se vrednost ovih deskriptora postepeno menja sa postepenim povecanjem broja ovih strukturnih detalja. Estradin indeks i rezolventna ´ energija se slicno pona ˇ saju, i u nekim slu ˇ cajevima pokazuju istu strukturnu osetljivost. To se ˇ moze pripisati visokoj korelaciji izme ˇ du njih. U cetvrtom delu poglavlja Rezultati i diskusija predstavljeni su rezultati ispitivanja uticaja ˇ cikla na vrednost energije grafa, Estradinog indeksa i rezolventne energije. Naime, pokazano je da indeksi dobro opisuju fine strukturne detalje, te se moze pretpostaviti da ukoliko znamo ˇ kako je deskriptor koreliran sa strukturom onda mozemo da saznamo i kako osobine zavise ˇ od strukture. U cilju ispitivanja uticaja cikla na vrednost molekulskih deskriptora zasnovanih na sopstvenim vrednostima dizajnirana su tri in silicio eksperimenta . Poslednji deo ovog poglavlja predstavlja rezultate potencijalne hemijske primenljivosti nasih deskriptora. Ta ˇ cnije, ispitan je potencijal predvi ˇ danja fizicko–hemijskih osobina. Ener- ˇ gija grafa, Estradin indeks i rezolventna energija testirani su kao orude za predvidanje tacke ˇ kljucanja, toplote obrazovanja i koeficijenta raspodele oktanol/voda alkana. Pokazano je da ˇ se molekulski deskriptor zasnovan na sopstvenim vrednostima ne moze pojedina ˇ cno koristiti ˇ za uspesno predvi ˇ danje ovih fizicko–hemijskih osobina. Prvi zagreba ˇ cki indeks, broj nula u ˇ spektru i broj metil grupa, takode, moraju biti ukljuceni u modele. Dobijene statisti ˇ cke ve- ˇ licine pokazuju da su modeli konstruisani pomo ˇ cu Estradinog indeksa i rezolventne energije ´ znatno bolji od modela sa energijom grafa. Takav trend je jos izra ˇ zeniji u slu ˇ caju koeficijenta ˇ raspodele oktanol/voda alkanaMolecular descriptors are numbers or series of numbers used for quantification of molecular structure. A special class of molecular descriptors are graph invariants. They are also known as topological molecular descriptors. The derivation of these descriptors has been enabled by the substitution of molecule by a molecular graph. Many useful mathematical quantities may be calculated from a molecular graph, e.g., eigenvalues. Therefore, it became possible to construct molecular descriptors that are using eigenvalues. These are called eigenvalue–based topological molecular descriptors. Today, there are plethora of them. Only few of them are using eigenvalues obtained from the ordinary adjacency matrix. The graph energy, Estrada index, and resolvent energy are the most prominent among them. Within this doctoral dissertation comparative investigation of these descriptors have been performed. The first part of Results and discussion chapter reports results regarding investigation of relationships among graph energy, Estrada index, and resolvent energy. Three eigenvalue– based topological molecular descriptors are compared using several datasets of alkanes and benzenoid hydrocarbons. The relations among them are found and discussed. Structural parameters that govern these relations are identified and the corresponding formulae based on multiple linear regression have been obtained. It has been shown that all three investigated indices are encoding almost the same structural information of a molecule. They differ only by the extent of the sensitivity on a structural branching of a molecule and on the number of non–bonding molecular orbitals. Further analysis of the graph energy, the Estrada index, and the resolvent energy is concerned with the degeneracy of these descriptors. To test discriminative potential of these descriptors, several classes of chemical-tree-isomers have been employed. In these sets number of carbon atoms ranged from 9 up to 20. The quantification of degeneracy has been done using well–established measure. The results show that graph energy and Estrada index exert similar degeneracy level. The specious degeneracy of the resolvent energy in the case of chemical trees is discussed. Obtained results indicated that there are many chemical trees with the same resolvent energy. These are called 푟–equienergetic chemical trees. Then, the results on searching for resolvent equienergetic chemical trees are given. The third part of Results and discussion chapter brings results on structural sensitivity of the graph energy, the Estrada index, and the resolvent energy on several series of catacondensed and pericondensed isomeric benzenoid hydrocarbons. Structural sensitivity is one of the most important and the least investigated property of graph invariants. Recently, a novel method for assessing the structural sensitivity of topological molecular descriptors was put forward. This algorithm consists of several different steps. It is based on Tanimoto index and Morgan circular fingerprints. It was found that graph energy, Estrada index, and resolvent energy exert similar structural sensitivity on catacondensed isomers. The graph energy showed best performance on pericondensed molecules. Additionally, the sensitivities of these descriptors were tested on the catacondensed isomers with the increasing number of bays, coves, and fjords. It was revealed that these descriptors gradually change with the increasing number of these structural features. The Estrada index and resolvent energy perform similarly and in some cases with the same structural sensitivity. This may be attributed to the high correlation between them. The fourth part of the Results and discussion chapter presents the results of the examination of the influence of the cycle on the value of the graph energy, the Estrada index, and the v resolvent energy. Namely, it has been shown that indices describe fine structural details well, so it can be assumed that if we know how the descriptor correlates with the structure then we can also find out how the properties depend on the structure. In order to examine the influence of a cycle on the value of molecular descriptors based on the eigenvalues, three 푖푛 푠푖푙푖푐표 experiments were designed. The last part of this chapter presents results of potential chemical applicability of our descriptors. More precisely, predictive potential of eigenvalue–based topological molecular descriptors was examined. The graph energy, the Estrada index, and the resolvent energy were tested as parameters for the prediction of boiling points, heats of formation, and octanol/water partition coefficients of alkanes. It was shown that an eigenvalue–based molecular descriptor cannot be individually used for successful prediction of these physico–chemical properties. The first Zagreb index, the number of zeros in the spectrum and the number of methyl groups must be also involved in the models. Performed statistics showed that the models constructed using the Estrada index and the resolvent energy are significantly better than ones with the graph energy. Such trend is even more noticeable in the case of octanol/water partition coefficients of alkanes

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

    Kern-basierte Lernverfahren für das virtuelle Screening

    Get PDF
    We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.Wir untersuchen moderne Kern-basierte maschinelle Lernverfahren für das Liganden-basierte virtuelle Screening. Insbesondere entwickeln wir einen neuen Graphkern auf Basis iterativer Graphähnlichkeit und optimaler Knotenzuordnungen, setzen die Kernhauptkomponentenanalyse für Projektionsfehler-basiertes Novelty Detection ein, und beschreiben die Entdeckung eines neuen selektiven Agonisten des Peroxisom-Proliferator-aktivierten Rezeptors gamma mit Hilfe von Gauß-Prozess-Regression. Virtuelles Screening ist die rechnergestützte Priorisierung von Molekülen bezüglich einer vorhergesagten Eigenschaft. Es handelt sich um ein Problem der Chemieinformatik, das in der Trefferfindungsphase der Medikamentenentwicklung auftritt. Seine Liganden-basierte Variante beruht auf dem Ähnlichkeitsprinzip, nach dem (strukturell) ähnliche Moleküle tendenziell ähnliche Eigenschaften haben. In unserer Beschreibung des Lösungsansatzes mit Kern-basierten Lernverfahren betonen wir die Bedeutung molekularer Repräsentationen, einschließlich der auf ihnen definierten (Un)ähnlichkeitsmaße. Wir untersuchen Effekte in hochdimensionalen chemischen Deskriptorräumen, ihre Auswirkungen auf Ähnlichkeits-basierte Verfahren und geben einen Literaturüberblick zu Empfehlungen zur retrospektiven Validierung, einschließlich eines Beispiel-Workflows. Graphkerne sind formale Ähnlichkeitsmaße, die inneren Produkten entsprechen und direkt auf Graphen, z.B. annotierten molekularen Strukturgraphen, definiert werden. Wir geben einen Literaturüberblick über Graphkerne, insbesondere solche, die auf zufälligen Irrfahrten, Subgraphen und optimalen Knotenzuordnungen beruhen. Indem wir letztere mit einem Ansatz zur iterativen Graphähnlichkeit kombinieren, entwickeln wir den iterative similarity optimal assignment Graphkern. Wir beschreiben einen iterativen Algorithmus, zeigen dessen Konvergenz sowie die Eindeutigkeit der Lösung, und geben eine obere Schranke für die Anzahl der benötigten Iterationen an. In einer retrospektiven Studie zeigte unser Graphkern konsistent bessere Ergebnisse als chemische Deskriptoren und andere, auf optimalen Knotenzuordnungen basierende Graphkerne. Chemische Datensätze liegen oft auf Mannigfaltigkeiten niedrigerer Dimensionalität als der umgebende chemische Deskriptorraum. Dimensionsreduktionsmethoden erlauben die Identifikation dieser Mannigfaltigkeiten und stellen dadurch deskriptive Modelle der Daten zur Verfügung. Für spektrale Methoden auf Basis der Kern-Hauptkomponentenanalyse ist der Projektionsfehler ein quantitatives Maß dafür, wie gut neue Daten von solchen Modellen beschrieben werden. Dies kann zur Identifikation von Molekülen verwendet werden, die strukturell unähnlich zu den Trainingsdaten sind, und erlaubt so Projektionsfehler-basiertes Novelty Detection für virtuelles Screening mit ausschließlich positiven Beispielen. Wir führen eine Machbarkeitsstudie zur Lernbarkeit des Konzepts von Fettsäuren durch die Hauptkomponentenanalyse durch. Der Peroxisom-Proliferator-aktivierte Rezeptor (PPAR) ist ein im Zellkern vorkommender Rezeptor, der den Fett- und Zuckerstoffwechsel reguliert. Er spielt eine wichtige Rolle in der Entwicklung von Krankheiten wie Typ-2-Diabetes und Dyslipidämie. Wir etablieren ein Gauß-Prozess-Regressionsmodell für PPAR gamma-Agonisten mit chemischen Deskriptoren und unserem Graphkern durch gleichzeitiges Lernen mehrerer Kerne. Das Screening einer kommerziellen Substanzbibliothek und die anschließende Testung 15 ausgewählter Substanzen in einem Zell-basierten Transaktivierungsassay ergab vier aktive Substanzen. Eine davon, ein Naturstoff mit Cyclobutan-Grundgerüst, ist ein voller selektiver PPAR gamma-Agonist (EC50 = 10 +/- 0,2 muM, inaktiv auf PPAR alpha und PPAR beta/delta bei 10 muM). Unsere Studie liefert einen neuen PPAR gamma-Agonisten, legt den Wirkmechanismus eines bioaktiven Naturstoffs offen, und erlaubt Rückschlüsse auf die Naturstoffursprünge von Pharmakophormustern in synthetischen Liganden

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Novel Surrogate Measures Based on a Similarity Network for Neural Architecture Search

    Get PDF
    We propose two novel surrogate measures to predict the validation accuracy of the classification produced by a given neural architecture, thus eliminating the need to train it, in order to speed up neural architecture search (NAS). The surrogate measures are based on a solution similarity network, where distance between solutions is measured using the binary encoding of some graph sub-components of the neural architectures. These surrogate measures are implemented within local search and differential evolution algorithms and tested on NAS-Bench-101 and NAS-Bench-301 datasets. The results show that the performance of the similarity-network-based predictors, as measured by correlation between predicted and true accuracy values, are comparable to the state-of-the-art predictors in the literature, however they are significantly faster in achieving these high correlation values for NAS-Bench-101. Furthermore, in some cases, the use of these predictors significantly improves the search performance of the equivalent algorithm (differential evolution or local search) that does not use the predictor

    Estimating functional connectivity symmetry between oxy- and deoxy-haemoglobin: implications for fNIRS connectivity analysis

    Get PDF
    Functional Near InfraRed Spectroscopy (fNIRS) connectivity analysis is often performed using the measured oxy-haemoglobin (HbO2) signal, while the deoxy-haemoglobin (HHb) is largely ignored. The in-common information of the connectivity networks of both HbO2 and HHb is not regularly reported, or worse, assumed to be similar. Here we describe a methodology that allows the estimation of the symmetry between the functional connectivity (FC) networks of HbO2 and HHb and propose a differential symmetry index (DSI) indicative of the in-common physiological information. Our hypothesis is that the symmetry between FC networks associated with HbO2 and HHb is above what should be expected from random networks. FC analysis was done in fNIRS data collected from six freely-moving healthy volunteers over 16 locations on the prefrontal cortex during a real-world task in an out-of-the-lab environment. In addition, systemic data including breathing rate (BR) and heart rate (HR) were also synchronously collected and used within the FC analysis. FC networks for HbO2 and HHb were established independently using a Bayesian networks analysis. The DSI between both haemoglobin (Hb) networks with and without systemic influence was calculated. The relationship between the symmetry of HbO2 and HHb networks, including the segregational and integrational characteristics of the networks (modularity and global efficiency respectively) were further described. Consideration of systemic information increases the path lengths of the connectivity networks by 3%. Sparse networks exhibited higher asymmetry than dense networks. Importantly, our experimental connectivity networks symmetry between HbO2 and HHb departs from random (t-test: t(509) = 26.39, p < 0.0001). The DSI distribution suggests a threshold of 0.2 to decide whether both HbO2 and HHb FC networks ought to be studied. For sparse FC networks, analysis of both haemoglobin species is strongly recommended. Our DSI can provide a quantifiable guideline for deciding whether to proceed with single or both Hb networks in FC analysis

    Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

    Get PDF
    The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism. From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable. In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems. Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis
    corecore