33 research outputs found

    Graph-Based Feature Selection Approach for Molecular Activity Prediction

    Get PDF
    In the construction of QSAR models for the prediction of molecular activity, feature selection is a common task aimed at improving the results and understanding of the problem. The selection of features allows elimination of irrelevant and redundant features, reduces the effect of dimensionality problems, and improves the generalization and interpretability of the models. In many feature selection applications, such as those based on ensembles of feature selectors, it is necessary to combine different selection processes. In this work, we evaluate the application of a new feature selection approach to the prediction of molecular activity, based on the construction of an undirected graph to combine base feature selectors. The experimental results demonstrate the efficiency of the graph-based method in terms of the classification performance, reduction, and redundancy compared to the standard voting method. The graph-based method can be extended to different feature selection algorithms and applied to other cheminformatics problems

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

    Similarity Methods in Chemoinformatics

    Get PDF
    promoting access to White Rose research paper

    Antimalarial drug design: targeting the plasmodium falciparum cytochrome bc1 complex through computational modelling, chemical synthesis and biological testing

    Get PDF
    Malaria is a life-threatening disease which is responsible for roughly one million deaths annually. Previous successes in attempting to eradicate the disease have only been short lived, owing to the increased development of resistance in the parasite. There is a continued need for novel compounds which act at novel therapeutic targets, with the Plasmodium falciparum cytochrome bc1 complex (Pfbc1) representing one such target. Its inhibition halts the biochemical generation of ATP, thus resulting in parasite cell death. Work described in this thesis was concerned with utilising molecular modelling, synthesis and biological testing to develop novel antimalarial compounds, which selectively inhibit this target. The structural details of a number of compounds known to be active or inactive against Pfbc1 were used in combination with six different ligand based virtual screening techniques, and applied to the ZINC lead like library of compounds to identify potential chemotypes active against malaria. These methods included fingerprint similarity searching, principal component analysis, and naïve Bayesian classification. The hits from each of these methods were merged and formed part of a consensus analysis in which compounds identified across several methods were deemed of more interest than those which appeared less frequently. Each molecule was given a score based on its occurrence in the virtual screening methods and also its physicochemical properties. Compounds were filtered to remove those with unfavourable chemical properties, or which contained known toxicophores. 19 compounds were ultimately purchased and tested in vitro against the 3D7 strain of the malaria parasite. 5 of the compounds reported single digit µM IC50 values, with each containing novel structural chemotypes. The lead candidate contained a benzothiazole core, and reported an IC50 value against 3D7 of 4.53 ± 1.86 µM. Additional testing showed the compounds to be inactive against bovine bc1, which is promising as strong bovine bc1 inhibition has been shown to be indicative of cardiotoxicity in humans. Molecular docking was extensively employed to rationalise the activity of Pfbc1 inhibitors such as atovaquone and HDQ. A number of quinolone containing compounds were also subject to docking, with key observations made with regard to interactions thought to be crucial to their antimalarial activity. The hits from LBVS were also the focus of docking, further supporting their potential as Pfbc1 inhibitors. QSARs were developed for a series of 4-aminoquinoline compounds which had been tested against both the NF54 and K1 strains of malaria. MLR, PLS and kNN machine learning methods were investigated, with molecular descriptors contained within valid models interpreted. Significant models were identified and shown to have strong predictive abilities for both strains. QSAR models were similarly developed for a series of thiazolide compounds with activity against hepatitis C. SVM was found to give a significant model which was able to predict the cell safety of the thiazolide derivatives. The rational design of the novel pyrroloquinolone chemotype led to the synthesis of 7 synthetic analogues to investigate its SAR, via alkylation and Winterfeldt oxidation reactions. The compounds reported 3D7 activity values between 75 nM and 1.02 µM, with molecular docking supporting their potential for Qo binding and thus Pfbc1 inhibition

    Computational Approaches: Drug Discovery and Design in Medicinal Chemistry and Bioinformatics

    Get PDF
    This book is a collection of original research articles in the field of computer-aided drug design. It reports the use of current and validated computational approaches applied to drug discovery as well as the development of new computational tools to identify new and more potent drugs

    Development, validation and application of in-silico methods to predict the macromolecular targets of small organic compounds

    Get PDF
    Computational methods to predict the macromolecular targets of small organic drugs and drug-like compounds play a key role in early drug discovery and drug repurposing efforts. These methods are developed by building predictive models that aim to learn the relationships between compounds and their targets in order to predict the bioactivity of the compounds. In this thesis, we analyzed the strategies used to validate target prediction approaches and how current strategies leave crucial questions about performance unanswered. Namely, how does an approach perform on a compound of interest, with its structural specificities, as opposed to the average query compound in the test data? We constructed and present new guidelines on validation strategies to address these short-comings. We then present the development and validation of two ligand-based target prediction approaches: a similarity-based approach and a binary relevance random forest (machine learning) based approach, which have a wide coverage of the target space. Importantly, we applied a new validation protocol to benchmark the performance of these approaches. The approaches were tested under three scenarios: a standard testing scenario with external data, a standard time-split scenario, and a close-to-real-world test scenario. We disaggregated the performance based on the distance of the testing data to the reference knowledge base, giving a more nuanced view of the performance of the approaches. We showed that, surprisingly, the similarity-based approach generally performed better than the machine learning based approach under all testing scenarios, while also having a target coverage which was twice as large. After validating two target prediction approaches, we present our work on a large-scale application of computational target prediction to curate optimized compound libraries. While screening large collections of compounds against biological targets is key to identifying new bioactivities, it is resource intensive and challenging. Small to medium-sized libraries, that have been optimized to have a higher chance of producing a true hit on an arbitrary target of interest are therefore valuable. We curated libraries of readily purchasable compounds by: i. utilizing property filters to ensure that the compounds have key physicochemical properties and are not overly reactive, ii. applying a similaritybased target prediction method, with a wide target scope, to predict the bioactivities of compounds, and iii. employing a genetic algorithm to select compounds for the library to maximize the biological diversity in the predicted bioactivities. These enriched small to medium-sized compound libraries provide valuable tool compounds to support early drug development and target identification efforts, and have been made available to the community. The distinctive contributions of this thesis include the development and benchmarking of two ligand-based target prediction approaches under novel validation scenarios, and the application of target prediction to enrich screening libraries with biologically diverse bioactive compounds. We hope that the insights presented in this thesis will help push data driven drug discovery forward.Doktorgradsavhandlin

    Exploring Molecular Diversity: There is Plenty of Room at Markush's

    Get PDF
    L'estratègia de les etapes inicials del descobriment de fàrmacs està normalment basada en un procés anomenat hit-to-lead que implica un extens estudi entorn de la síntesi de derivats d'una molècula original que prèviament hagi mostrat certa activitat biològica davant d'una diana concreta. Per tant, aquest procés comporta la síntesi de molts anàlegs que descriurien una subquimioteca, que generalment evidencia que aquests estudis estan molt focalitzats al voltant de l'espai químic del compost original. Així i tot, quan aquesta molècula és finalment patentada, es descriu un espai químic molt més vast per mitjà d'estructures Markush donant per suposat que alguns dels seus derivats puguin presentar també activitat biològica. Tot i això, la presència d'aquestes estructures no implica la síntesi comprovada de tota la biblioteca molecular sinó només una petita mostra de la mateixa. La nostra hipòtesi és que hi ha una gran part de l’espai químic d’aquestes biblioteques que està sense explorar i pot amagar possibles candidats que poden fins i tot superar l’activitat del hit original. A través d'aquest projecte, es proposa una alternativa que sosté que una selecció racional de poques molècules – basat en l'agrupament segons semblança molecular – pot representar de manera més significativa l'espai químic establert, oferint la possibilitat d'explorar regions desconegudes que podrien amagar més potencial biològic. Després de revisar els darrers fàrmacs aprovats per la FDA en el període del 2008 al 2020 i la base de dades de molècules bioactives de ChEMBL, s'ha dut a terme una exploració de l'ampli espai químic resultant de molècules petites amb propietats similars a les dels medicaments per definir nous espais accessibles que podrien ocultar activitat. Els resultats obtinguts de set casos d'estudis reals han demostrat que tant la selecció racional com l’aleatòria representen més significativament les biblioteques combinatòries declarades a les patents, que les molècules descrites fins ara. S'han realitzat dos estudis pràctics que implementen aquesta metodologia suggerida per descriure millor l'espai químic del fàrmac antipalúdic Tafenoquina i del Dacomitinib, un inhibidor de tirosina cinases de segona generació per al tractament del càncer de pulmó de cèl·lules no petites. L’exploració de l’espai químic d’aquestes dues famílies ha portat a la síntesi racional de set anàlegs antipalúdics i vuit inhibidors de cinases que han mostrat interessants activitats inhibidores. Aquests resultats demostren que l'aplicació de la quimioinformàtica per a la selecció de biblioteques pot millorar la capacitat d'inspeccionar millor els conjunts de dades químiques per identificar nous compostos precandidats i representar grans biblioteques per a posteriors campanyes de reposicionament.La estrategia de las etapas iniciales del descubrimiento de fármacos está normalmente basada en un proceso denominado hit-to-lead que implica un extenso estudio entorno a la síntesis de derivados de una molécula original que previamente haya expresado cierta actividad biológica frente a una diana concreta. Por ende, este proceso conlleva la síntesis de muchos análogos que describirían una sublibrería química, la cual generalmente evidencia que estos estudios están muy focalizados alrededor del espacio químico del compuesto original. Aún y así, cuando esta molécula es finalmente patentada, se describe un espacio químico mucho más vasto por medio de estructuras Markush teorizando que algunos de sus derivados puedan presentar también actividad biológica. Sin embargo, la presencia de estas estructuras no implica la síntesis comprobada de toda la biblioteca molecular sino solo una pequeña muestra de la misma. Nuestra hipótesis es que hay una gran parte del espacio químico de estas bibliotecas que está sin explorar y puede ocultar posibles candidatos que pueden hasta superar la actividad del hit original. A través de este proyecto, se propone una alternativa que sostiene que una selección racional de pocas moléculas – fundada en el agrupamiento según su similitud química – puede representar de manera más significativa el espacio químico establecido, ofreciendo la posibilidad de explorar regiones desconocidas que podrían ocultar más potencial biológico. Después de revisar los últimos fármacos aprobados por la FDA en el período de 2008 a 2020 y la base de datos de moléculas bioactivas de ChEMBL, se ha llevado a cabo una exploración del amplio espacio químico resultante de moléculas pequeñas con propiedades similares a las de los medicamentos para definir nuevos espacios accesible que podrían ocultar actividad. Los resultados obtenidos de siete casos de estudios reales han demostrado que tanto la selección racional como la aleatoria representan más significativamente las bibliotecas combinatorias declaradas en las patentes que las moléculas descritas hasta la fecha. Se han desarrollado dos estudios prácticos que implementan esta metodología sugerida para describir mejor el espacio químico del fármaco antipalúdico Tafenoquina y Dacomitinib, un inhibidor de la tirosina quinasa de segunda generación para el tratamiento del cáncer de pulmón de células no pequeñas. La exploración del espacio químico de estas dos familias ha llevado a la síntesis racional de siete análogos antipalúdicos y ocho inhibidores de quinasas que han mostrado interesantes actividades inhibidoras. Estos resultados demuestran que la aplicación de la quimioinformática para la selección de bibliotecas puede mejorar la capacidad de inspeccionar mejor los conjuntos de datos químicos para identificar nuevos potenciales hits y representar grandes bibliotecas para fines de reposicionamiento.The early Drug Discovery strategy is commonly based on a hit-to-lead process which involves large research on the synthesis of derivatives of an original molecule that had previously shown biological activity against a specific biological target. Therefore, this process implies the synthesis of many analogs leading to the description of a chemical sub-library which generally leads to a highly focused study on the chemical space nearby the hit compound. However, when this drug is finally patented, a wider chemical space derived from a Markush structure is described, theorizing that some analogs within may present biological activity. Nevertheless, this claim involving the Markush structure does not imply the proven synthesis of all the chemical library but just a small population of it. We hypothesize that there is a great part of the chemical space of these libraries that is unexplored and can hide potential lead candidates which may even surpass the activity of the original hit. Through this project, an alternative is proposed claiming that a rational selection of a short sample of small molecules – founded on similarity-based clustering – can represent more significatively the stated chemical space offering the possibility to explore the unknown space that could hide more potential biological activity. After a review on the latest approved drugs by the FDA in the period from 2008 to 2020 and the ChEMBL database of bioactive molecules, an exploration of the resulting wide chemical space of small molecules with drug-like properties has been assessed in order to define accessible spots that might hide biological activity. The obtained results from seven real cases of study have proven that random and rationally selected molecules represent more significantly the combinatorial libraries stated in the patents rather than the reported molecules until date. Furthermore, two practical studies implementing our suggested methodology have been developed to better describe the chemical space of the antimalarial drug Tafenoquine and Dacomitinib, a second-generation tyrosine kinase inhibitor for non-small-cell lung cancer treatment. The assessment driven by a better chemical space exploration of these two families have led to the rational synthesis of seven antimalarial analogs and eight kinase inhibitors which have shown interesting inhibitory activities. Our results evince that the application of cheminformatics for library selection may improve the ability to better inspect chemical datasets in order to identify new potential hits and represent large libraries for further reprofiling purposes

    Application of multivariate statistics and machine learning to phenotypic imaging and chemical high-content data

    Get PDF
    Image-based high-content screens (HCS) hold tremendous promise for cell-based phenotypic screens. Challenges related to HCS include not only storage and management of data, but critical analysis of the complex image-based data. I implemented a data storage and screen management framework and developed approaches for data analysis of a number high-content microscopy screen formats. I visualized and analysed pilot screens to develop a robust multi-parametric assay for the identification of genes involved in DNA damage repair in HeLa cells. Further, I developed and implemented new approaches for image processing and screen data normalization. My analyses revealed that the ubiquitin ligase RNF8 plays a central role in DNA-damage response and that a related ubiquitin ligase RNF168 causes the cellular and developmental phenotypes characteristic for the RIDDLE syndrome. My approaches also uncovered a role for the MMS22LTONSL complex in DSB repair and its role in the recombination-dependent repair of stalled or collapsed replication forks. The discovery of novel bioactive molecules is a challenge because the fraction of active candidate molecules is usually small and confounded by noise in experimental readouts. Cheminformatics can improve robustness of chemical high-throughput screens and functional genomics data sets by taking structure-activity relationships into account. I applied statistics, machine learning and cheminformatics to different data sets to discern novel bioactive compounds. I showed that phenothiazines and apomorphines are regulators for cell differentiation in murine embryonic stem cells. Further, I pioneered computational methods for the identification of structural features that influence the degradation and retention of compounds in the nematode C. elegans. I used chemoinformatics to assemble a comprehensive screening library of previously approved drugs for redeployment in new bioassays. A combination of chemical genetic interactions, cheminformatics and machine learning allowed me to predict novel synergistic antifungal small molecule combinations from sensitized screens with the drug library. In another study on the biological effects of commonly prescribed psychoactive compounds, I discovered a strong link between lipophilicity and bioactivity of compounds in yeast and unexpected off-target effects that could account for unwanted side effects in humans. I also investigated structure-activity relationships and assessed the chemical diversity of a compound collection that was used to probe chemical-genetic interactions in yeast. Finally, I have made these methods and tools available to the scientific community, including an open source software package called MolClass that allows researchers to make predictions about bioactivity of small molecules based on their chemical structure

    Identification of structure activity relationships in primary screening data of high-throughput screening assays

    Get PDF
    The aim of the thesis was to identify structure activity relationships (SAR) in the primary screening data of high-throughput screening (HTS) assays. The strategy was to perform a hierarchical clustering of the molecules, assign the primary screening data to the created clusters and derive models from the clusters. The models should serve to identify singletons, clusters enriched with actives, not confirmed hits and false-negatives. Two hierarchical clustering algorithms, NIPALSTREE and hierarchical k-means have been developed and adapted for this purpose, respectively. A graphical user interface (GUI) has been implemented to extract SAR from the clustering results. Retrospective and prospective applications of the clustering approach were performed. SAR models were created by combining the clustering results with different chemoinformatic methods. NIPALSTREE projects a data set onto one dimension using principle component analysis. The data set is sorted according to the scoring vector and split at the median position into two subsets. The algorithm is applied recursively onto the subsets. The hierarchical k-means recursively separates a data set into two clusters using the k-means algorithm. Both algorithms are capable of clustering large data sets with more than a million data points. They were validated and compared to each other on the basis of different structural classes. NIPALSTREE provided with the loading vectors first insights into SAR whereas the hierarchical k-means yielded superior results. A GUI was developed allowing the display of and the navigation in the clustering results. Functionalities were integrated to analyse the clusters in the dendrogram, molecules in a cluster, and physicochemical properties of a molecule. Measures were developed to identify clusters enriched with actives, to characterize singletons and to analyse selectivity and specificity. Different protease inhibitors of the COBRA database were examined using the hierarchical k-means algorithm. Supported by similarity searches and nearest neighbour analyses thrombin inhibitor singletons were quickly isolated and displayed in the dendrogram. By scaling enrichment factors to the logarithm of the dendrogram level, clusters enriched with different structural classes of factor Xa inhibitors were simultaneously identified. The observed co-clustering of other protease inhibitors provided a deeper insight into selectivity and specificity and shows the utility of the approach for constructing focussed screening libraries. Specificity was analyzed by extracting and clustering relative frequencies of the protease inhibitors from the clusters of dendrogram level 7. A unique ligand based point of view on the pocketome of the protease enzymes was obtained. To identify not confirmed hits and false-negatives in the primary screening data of HTS assays, three assays were retrospectively analysed with the hierarchical k-means algorithm. A rule catalogue was developed judging hits in terminal clusters based on the cluster size, the percent control values of the entries in a cluster, the overall hit rate, the hit rate in the cluster and the environment of a cluster in the dendrogram. It resulted in the identification of a high proportion of not confirmed hits and provided for each hit a rating in context of related non-hits. This allows prioritizing compounds for follow-up studies. Non-hits and hits were retrieved from terminal clusters containing hits. Molecules bearing false-negative scaffolds were co-extracted and enriched. To minimize the number of false-positives in the extracted lists, Bayesian regularized artificial neutral network classification models were trained with the data. Applying the models marked improvement of enrichment factors for the false-negatives was obtained. It proofs the scaffold-hopping potential of the approach. NIPALSTREE, the hierarchical k-means algorithm and self-organising maps were prospectively applied to identify novel lead candidates for dopamine D3 receptors. Compounds with novel scaffolds and low nanomolar binding affinity (65 nM, compound 42) were identified. To provide a deeper insight into the SAR of these molecules, different alternative computational methods were employed. Support vector-based regression and partial least squares were examined. Predictive models for dopamine D2 and D3 receptor binding affinity values were obtained. Important features explaining SAR were extracted from the models. The prospective application of the models to the diverse and novel virtual screening data was of limited success only. Docking studies were performed using a homology model of the dopamine D3 receptor. The visual inspection of the binding modes resulted in the hypothesis of two alternative binding pockets for the aryl moiety of dopamine D3 receptor antagonists. A pharmacophore model was created simultaneously requiring both aryl moieties. Virtual screening with the model identified a nanomolar hit (65 nM, compound 59) corroborating the hypothesis of the two binding pockets and providing a new lead structure for dopamine D3 receptors. The presented data shows that the combined approach of hierarchically clustering a data set in combination with the subsequent usage of the clusters for model generation is suited to extract SAR from screening data. The models are successful in identifying singletons, clusters enriched with actives, not confirmed hits and false-negative scaffolds.Das Ziel der Arbeit war es, Struktur-Aktivitätsbeziehungen (SAR) in primären Screeningdaten von Hochdurchsatzscreening (HTS)- Assays zu finden. Als Strategie sollten die Moleküle hierarchisch geclustert werden, die primären Screeningdaten den gebildeten Clustern zugeordnet und Modelle aus den Clustern abgeleitet werden. Die Modelle sollten das Auffinden von Singletons, mit Hits angereicherter Cluster, nicht bestätigter Hits und falsch Negativer ermöglichen. Zu diesem Zweck wurden zwei hierarchische Clusteralgorithmen, NIPALSTREE und hierarchischer k-means, entwickelt bzw. angepasst. Eine graphische Benutzeroberfläche (GUI) wurde implementiert, um SAR aus den Ergebnissen der Clusterung abzuleiten. Retrospektive und prospektive Anwendungen wurden mit den Clusteransätzen verfolgt. SAR Modelle wurden durch Verwendung der Ergebnisse der Clusterung mit verschiedenen chemoinformatischen Verfahren erstellt. NIPALSTREE projiziert mit Hilfe der Hauptkomponentenanalyse einen Datensatz auf eine Dimension. Der Datensatz wird anhand des Scoringvektors sortiert und, basierend auf dem Median, in zwei Teilmengen aufgetrennt. Der Algorithmus wird rekursiv auf die neu gebildeten Mengen angewandt. Der hierarchische k-means Algorithmus trennt, basierend auf dem k-means Algorithmus, einen Datensatz rekursiv in zwei Cluster auf. Beide Algorithmen sind in der Lage, große Datenmengen mit mehr als einer Million Datenpunkte zu clustern. Sie wurden anhand verschiedener Strukturklassen validiert und miteinander verglichen. NIPALSTREE erbrachte mit dem Loadingvektor erste Einblicke in die SAR, wohingegen der hierarchische k-means zu besseren Ergebnissen führte. Eine GUI wurde entwickelt, die es erlaubt, die Clusterergebnisse darzustellen und darin zu navigieren. Funktionalitäten wurden bereitgestellt, um die Cluster im Dendrogramm, die Moleküle eines Clusters und die physikochemischen Eigenschaften eines Moleküls zu analysieren. Verfahren wurden entwickelt, um mit Hits angereicherte Cluster zu finden, Singletons zu charakterisieren und Selektivität und Spezifität zu analysieren. Verschiedene Proteaseinhibitoren aus der COBRA-Datenbank wurden mit dem hierarchischen k-means Algorithmus näher betrachtet. Mit Hilfe von Ähnlichkeitssuchen und nächsten Nachbaranalysen wurden Thrombininhibitorsingletons im Dendrogram in kürzester Zeit isoliert und dargestellt. Cluster, die mit verschiedenen Strukturklassen von Faktor-Xa-Inhibitoren angereichert waren, wurden, durch Skalierung des Anreicherungsfaktors auf den Logarithmus der Dendrogrammebene, gleichzeitig im Dendrogramm identifiziert. Eine Clusterung der Faktor-Xa-Inhibitoren mit anderen Proteaseinhibitoren wurde beobachtet. Sie erbrachte einen vertieften Einblick in Selektivität und Spezifität und zeigt die Anwendbarkeit des Ansatzes zur Erstellung fokussierter Screeningbibliotheken. Durch Extrahierung und Clusterung der relativen Anteile der Proteaseinhibitoren aus den Clustern von Dendrogrammebene sieben wurde die Spezifität der Proteaseinhibitoren analysiert. Eine spezifische, Liganden basierte Betrachtung des Pocketoms der Proteaseenzyme wurde erhalten. Um nicht bestätigte Hits und falsch Negative in den primären Screening Daten von HTS Assays zu finden, wurden drei Assays in Retrospektive mit dem hierarchischen k-means analysiert. Ein Regelwerk wurde entwickelt, welches Hits anhand der Clustergröße, des Prozent-Kontrollwertes der Einträge eines Clusters, der Gesamthitrate, der Hitrate in einem Cluster und der Umgebung des Clusters im Dendrogramm bewertet. Das Regelwerk führte zum Auffindung eines großen Anteils nicht bestätigter Hits. Zudem wurde für jeden Hit eine Bewertung im Kontext verwandter Nichthits erhalten. Dies erlaubt ein Priorisieren von Molekülen für Folgeuntersuchungen. Nichthits und Hits wurden aus Endcluster, die Hits enthielten, extrahiert. Moleküle mit falsch negativen Molekülgrundgerüsten wurden koextrahiert und angereichert. Um falsch Positive in den extrahierten Listen zu minimieren, wurden Bayesische regularisierte neuronale Klassifizierungsnetze mit den Daten trainiert. Die Anwendung der Modelle ergab eine deutliche Verbesserung der Anreicherungsfaktoren der falsch Negativen. Es zeigt, dass die Methode in der Lage ist, einen Molekülgrundgerüstwechsel durchzuführen. NIPALSTREE, der hierarchische k-means und selbst organisierende Karten wurden prospektiv angewandt, um neue Leitstrukturkandidaten für Dopamin-D3-Rezeptoren zu finden. Moleküle mit neuen Molekülgrundgerüsten und Bindungsaffinitäten im niedrigen nanomolaren Bereich wurden gefunden (65 nM für Molekül 42). Um einen tieferen Einblick in die SAR dieser Moleküle zu erhalten, wurden verschiede Computerverfahren verwendet. Supportvektorregression und PLS („partial least squares“) wurden untersucht. Es war möglich, voraussagende Modelle für Dopamin-D2 und D3 Bindungsaffinitäten zu erstellen. Die SAR erklärende Moleküleigenschaften konnten aus den Modellen extrahiert werden. Die prospektive Anwendung der Modelle auf die diversen und neuen virtuellen Screeningdaten war nur von begrenztem Erfolg. Dockingstudien wurden mit einem Homologiemodell des Dopamin-D3-Rezeptors durchgeführt. Die visuelle Begutachtung der Bindemoden führte zur Hypothese zweier alternativer Bindetaschen für den Aryl-Rest von Dopamin-D3-Rezeptorantagonisten. Ein Pharmakophormodell wurde erstellt, welches beide Aryl-Reste gleichzeitig benötigt. Ein virtuelles Screening mit dem Modell identifizierte einen nanomolaren Hit (65 nM für Molekül 59), welcher die Hypothese unterstützt und eine neue Leitstruktur für Dopamin-D3-Rezeptoren darstellt. Die vorgestellten Daten zeigen, dass der kombinierte Ansatz aus hierarchischer Clusterung und anschließender Verwendung der Cluster zur Modellerstellung, SAR in HTS-Daten findet. Die Modelle sind geeignet zum Auffinden von Singletons, mit Hits angereichter Cluster, nicht bestätigter Hits und falsch negativer Molekülgrundgerüste
    corecore