444 research outputs found

    Designing algorithms to aid discovery by chemical robots

    Get PDF
    Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery

    Kern-basierte Lernverfahren für das virtuelle Screening

    Get PDF
    We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.Wir untersuchen moderne Kern-basierte maschinelle Lernverfahren für das Liganden-basierte virtuelle Screening. Insbesondere entwickeln wir einen neuen Graphkern auf Basis iterativer Graphähnlichkeit und optimaler Knotenzuordnungen, setzen die Kernhauptkomponentenanalyse für Projektionsfehler-basiertes Novelty Detection ein, und beschreiben die Entdeckung eines neuen selektiven Agonisten des Peroxisom-Proliferator-aktivierten Rezeptors gamma mit Hilfe von Gauß-Prozess-Regression. Virtuelles Screening ist die rechnergestützte Priorisierung von Molekülen bezüglich einer vorhergesagten Eigenschaft. Es handelt sich um ein Problem der Chemieinformatik, das in der Trefferfindungsphase der Medikamentenentwicklung auftritt. Seine Liganden-basierte Variante beruht auf dem Ähnlichkeitsprinzip, nach dem (strukturell) ähnliche Moleküle tendenziell ähnliche Eigenschaften haben. In unserer Beschreibung des Lösungsansatzes mit Kern-basierten Lernverfahren betonen wir die Bedeutung molekularer Repräsentationen, einschließlich der auf ihnen definierten (Un)ähnlichkeitsmaße. Wir untersuchen Effekte in hochdimensionalen chemischen Deskriptorräumen, ihre Auswirkungen auf Ähnlichkeits-basierte Verfahren und geben einen Literaturüberblick zu Empfehlungen zur retrospektiven Validierung, einschließlich eines Beispiel-Workflows. Graphkerne sind formale Ähnlichkeitsmaße, die inneren Produkten entsprechen und direkt auf Graphen, z.B. annotierten molekularen Strukturgraphen, definiert werden. Wir geben einen Literaturüberblick über Graphkerne, insbesondere solche, die auf zufälligen Irrfahrten, Subgraphen und optimalen Knotenzuordnungen beruhen. Indem wir letztere mit einem Ansatz zur iterativen Graphähnlichkeit kombinieren, entwickeln wir den iterative similarity optimal assignment Graphkern. Wir beschreiben einen iterativen Algorithmus, zeigen dessen Konvergenz sowie die Eindeutigkeit der Lösung, und geben eine obere Schranke für die Anzahl der benötigten Iterationen an. In einer retrospektiven Studie zeigte unser Graphkern konsistent bessere Ergebnisse als chemische Deskriptoren und andere, auf optimalen Knotenzuordnungen basierende Graphkerne. Chemische Datensätze liegen oft auf Mannigfaltigkeiten niedrigerer Dimensionalität als der umgebende chemische Deskriptorraum. Dimensionsreduktionsmethoden erlauben die Identifikation dieser Mannigfaltigkeiten und stellen dadurch deskriptive Modelle der Daten zur Verfügung. Für spektrale Methoden auf Basis der Kern-Hauptkomponentenanalyse ist der Projektionsfehler ein quantitatives Maß dafür, wie gut neue Daten von solchen Modellen beschrieben werden. Dies kann zur Identifikation von Molekülen verwendet werden, die strukturell unähnlich zu den Trainingsdaten sind, und erlaubt so Projektionsfehler-basiertes Novelty Detection für virtuelles Screening mit ausschließlich positiven Beispielen. Wir führen eine Machbarkeitsstudie zur Lernbarkeit des Konzepts von Fettsäuren durch die Hauptkomponentenanalyse durch. Der Peroxisom-Proliferator-aktivierte Rezeptor (PPAR) ist ein im Zellkern vorkommender Rezeptor, der den Fett- und Zuckerstoffwechsel reguliert. Er spielt eine wichtige Rolle in der Entwicklung von Krankheiten wie Typ-2-Diabetes und Dyslipidämie. Wir etablieren ein Gauß-Prozess-Regressionsmodell für PPAR gamma-Agonisten mit chemischen Deskriptoren und unserem Graphkern durch gleichzeitiges Lernen mehrerer Kerne. Das Screening einer kommerziellen Substanzbibliothek und die anschließende Testung 15 ausgewählter Substanzen in einem Zell-basierten Transaktivierungsassay ergab vier aktive Substanzen. Eine davon, ein Naturstoff mit Cyclobutan-Grundgerüst, ist ein voller selektiver PPAR gamma-Agonist (EC50 = 10 +/- 0,2 muM, inaktiv auf PPAR alpha und PPAR beta/delta bei 10 muM). Unsere Studie liefert einen neuen PPAR gamma-Agonisten, legt den Wirkmechanismus eines bioaktiven Naturstoffs offen, und erlaubt Rückschlüsse auf die Naturstoffursprünge von Pharmakophormustern in synthetischen Liganden

    Combined in silico/in vitro screening tools for identification of new insulin receptor ligands

    Get PDF
    Die Interaktion von Insulin mit dem extrazellulären Teil des Insulinrezeptors ist ein entscheidender Schritt des Insulin-Signalweges. Der Insulinrezeptor wird daraufhin autophosphoryliert und die intrazelluläre Tyrosinkinasedomäne wird aktiviert. Im Jahr 1999 publizierten Zhang et al. einen Wirkstoff der in einem Pilzextrakt gefunden wurde und den humanen Insulinrezeptor aktivieren kann, indem er direkt mit der intrazellulären Domäne der beta-Subeinheit interagiert. Diese Substanz (Demethylasterriquinone B-1, DMAQ-B1) ist in der Lage den Blutzuckerspiegel in Mausmodellen für Typ-2 Diabetes zu senken. In den letzten Jahren wurden Strukturen und Aktivitätswerte zu ca. 100 Derivaten dieser Substanz publiziert. Die meisten dieser Verbindungen enthalten eine Quinon-Substruktur, die zu toxischen Nebenwirkungen führen könnte. Da die Behandlung von Typ-2 Diabetes die Langzeittherapie mit Antidiabetes-Medikamenten beinhaltet, wäre es vorteilhaft, Insulinrezeptor aktivierende Wirkstoffe aus einer anderen Strukturklasse zu finden. Das Ziel dieser Dissertation war die Entwicklung von Computermodellen, die zur Identifizierung von neuen, Insulin- imitierenden Wirkstoffen führen können, sowie die anschließende Validierung der Modelle in biologischen (zellbasierten) Experimenten. Drei unterschiedliche ligandenbasierte Methoden, nämlich Self-organizing Maps, Fingerprint- sowie Shape-ähnlichkeit, wurden verwendet um in einer großen kommerziellen Datenbank nach potenziellen Insulinrezeptor aktivierenden Wirkstoffen zu suchen. Durch die Testung von 13 repräsentativen Verbindungen der identifizierten Substanzklassen konnten wir drei Strukturen identifizieren, die Akt, eine downstream Kinase des aktivierten Insulinrezeptors aktivierten. Eine dieser Substanzen war in der Lage die Glukoseaufnahme in Muskelzellen zu verstärken. Derivate dieser Struktur wurden untersucht, um weiterführende Informationen über Struktur-Aktivitätsbeziehungen zu erhalten. Zusätzlich wurde die Zytotoxizität der Substanzen getestet, um zu zeigen, dass die Insulin imitierende Aktivität der identifizierten Moleküle nicht mit toxischen Effekten korreliert.The binding of insulin to the extracellular part of the insulin receptor is a key step in the insulin signalling pathway. Upon binding, the receptor is autophosphorylated and the intracellular tyrosine kinase is activated. In 1999, Zhang et al. published a small molecule identified from a fungal extract, which activates the human insulin receptor by binding directly to the intracellular domain of its beta-subunit. This compound (demethylasterriquinone B-1, DMAQ-B1) was shown to lower blood glucose levels in mouse models of type 2 diabetes mellitus. During the last years, structures and activities of approximately 100 derivatives of this compound have been published. Most of these structures contained a quinone substructure, which might cause toxic side effects. Since treatment of type 2 diabetes includes long-term administration of anti-diabetic compounds, it would be beneficial to find compounds with a different type of structure which activate the insulin receptor. The aim of this dissertation was to build computational models which can be used to screen for new insulin-mimetic compounds and subsequent validation of the models by testing some of the obtained hits in relevant biological (i.e. cell-based) experiments. Three different ligand based computational methods, namely self-organizing maps, fingerprint similarity and shape similarity, have been used to screen a large vendor database for potential insulin receptor activating compounds. By testing 13 representative compounds from the identified scaffolds we found three compounds which are able to activate Akt kinase, an important downstream target of the activated insulin receptor. One of the compounds increased glucose uptake in muscle cells. Derivatives of these compounds were further investigated to gain information on structure activity relationships. Additionally, the toxicity of the compounds in cells was assessed to show that the insulin-mimetic activity of our identified molecules is not correlated with toxic effects

    Development, validation and application of in-silico methods to predict the macromolecular targets of small organic compounds

    Get PDF
    Computational methods to predict the macromolecular targets of small organic drugs and drug-like compounds play a key role in early drug discovery and drug repurposing efforts. These methods are developed by building predictive models that aim to learn the relationships between compounds and their targets in order to predict the bioactivity of the compounds. In this thesis, we analyzed the strategies used to validate target prediction approaches and how current strategies leave crucial questions about performance unanswered. Namely, how does an approach perform on a compound of interest, with its structural specificities, as opposed to the average query compound in the test data? We constructed and present new guidelines on validation strategies to address these short-comings. We then present the development and validation of two ligand-based target prediction approaches: a similarity-based approach and a binary relevance random forest (machine learning) based approach, which have a wide coverage of the target space. Importantly, we applied a new validation protocol to benchmark the performance of these approaches. The approaches were tested under three scenarios: a standard testing scenario with external data, a standard time-split scenario, and a close-to-real-world test scenario. We disaggregated the performance based on the distance of the testing data to the reference knowledge base, giving a more nuanced view of the performance of the approaches. We showed that, surprisingly, the similarity-based approach generally performed better than the machine learning based approach under all testing scenarios, while also having a target coverage which was twice as large. After validating two target prediction approaches, we present our work on a large-scale application of computational target prediction to curate optimized compound libraries. While screening large collections of compounds against biological targets is key to identifying new bioactivities, it is resource intensive and challenging. Small to medium-sized libraries, that have been optimized to have a higher chance of producing a true hit on an arbitrary target of interest are therefore valuable. We curated libraries of readily purchasable compounds by: i. utilizing property filters to ensure that the compounds have key physicochemical properties and are not overly reactive, ii. applying a similaritybased target prediction method, with a wide target scope, to predict the bioactivities of compounds, and iii. employing a genetic algorithm to select compounds for the library to maximize the biological diversity in the predicted bioactivities. These enriched small to medium-sized compound libraries provide valuable tool compounds to support early drug development and target identification efforts, and have been made available to the community. The distinctive contributions of this thesis include the development and benchmarking of two ligand-based target prediction approaches under novel validation scenarios, and the application of target prediction to enrich screening libraries with biologically diverse bioactive compounds. We hope that the insights presented in this thesis will help push data driven drug discovery forward.Doktorgradsavhandlin

    Machine-learning approaches in drug discovery: methods and applications

    No full text
    During the past decade, virtual screening (VS) has evolved from traditional similarity searching, which utilizes single reference compounds, into an advanced application domain for data mining and machine-learning approaches, which require large and representative training-set compounds to learn robust decision rules. The explosive growth in the amount of public domain-available chemical and biological data has generated huge effort to design, analyze, and apply novel learning methodologies. Here, I focus on machine-learning techniques within the context of ligand-based VS (LBVS). In addition, I analyze several relevant VS studies from recent publications, providing a detailed view of the current state-of-the-art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances

    Multi-faceted Structure-Activity Relationship Analysis Using Graphical Representations

    Get PDF
    A core focus in medicinal chemistry is the interpretation of structure-activity relationships (SARs) of small molecules. SAR analysis is typically carried out on a case-by-case basis for compound sets that share activity against a given target. Although SAR investigations are not a priori dependent on computational approaches, limitations imposed by steady rise in activity information have necessitated the use of such methodologies. Moreover, understanding SARs in multi-target space is extremely difficult. Conceptually different computational approaches are reported in this thesis for graphical SAR analysis in single- as well as multi-target space. Activity landscape models are often used to describe the underlying SAR characteristics of compound sets. Theoretical activity landscapes that are reminiscent of topological maps intuitively represent distributions of pair-wise similarity and potency difference information as three-dimensional surfaces. These models provide easy access to identification of various SAR features. Therefore, such landscapes for actual data sets are generated and compared with graph-based representations. Existing graphical data structures are adapted to include mechanism of action information for receptor ligands to facilitate simultaneous SAR and mechanism-related analyses with the objective of identifying structural modifications responsible for switching molecular mechanisms of action. Typically, SAR analysis focuses on systematic pair-wise relationships of compound similarity and potency differences. Therefore, an approach is reported to calculate SAR feature probabilities on the basis of these pair-wise relationships for individual compounds in a ligand set. The consequent expansion of feature categories improves the analysis of local SAR environments. Graphical representations are designed to avoid a dependence on preconceived SAR models. Such representations are suitable for systematic large-scale SAR exploration. Methods for the navigation of SARs in multi-target space using simple and interpretable data structures are introduced. In summary, multi-faceted SAR analysis aided by computational means forms the primary objective of this dissertation

    Untersuchung der Struktur und Interaktion mit allosterischen Modulatoren der Familie C GPCRs mit Hilfe von Sequenz-, Struktur- und Ligand-basierten Verfahren

    Get PDF
    This study focuses on structural features of a particular GPCR type, the family C GPCRs. Structure- and ligand-based approaches were adopted for prediction of novel mGluR5 binding ligand and their binding modes. The objectives of this study were: 1. An analysis of function and structural implication of amino acids in the TM region of family C GPCRs. 2. The prediction of the TM domain structure of mGluR5. 3. The discovery of novel selective allosteric modulators of mGluR5 by virtual screening. 4. The prediction of a ligand binding mode for the allosteric binding site in mGluR5. GPCRs are a super-family of structurally related proteins although their primary amino acid sequence can be diverse. Using sequence information a conservation analysis of family C GPCRs should be applied to reveal characteristic differences and similarities with respect function, folding and ligand binding. Using experimental data and conservation analysis the allosteric binding site of mGluR5 should be characterized regarding NAM and PAM and selective ligand binding. For further evaluation experimental knowledge about family A GPCRs as well as conservation between vertebrate rhodopsins was planned to be compared to results obtained for family C GPCRs (Section 4.1 Conservation analysis of family C GPCRs). Since no receptor structure is available for any family C GPCR, discussion of conserved sequence positions between family A and C GPCRs requires the prediction of a receptor structure for mGluR5 using a family A receptor as template. In order to predict the mGluR5 structure a sequence alignment to a GPCR template protein will have to be proposed and GPCR specific features considered in structure calculation (Section 4.1.4 Structure prediction of mGluR5). The obtained structure was intended to be involved in ligand binding mode prediction of newly discovered active molecules. For discovery of novel selective mGluR modulators several ligand-based virtual screening protocols were adapted and evaluated. Prediction models were derived for selection of possibly active molecules using a diverse collection of known mGluR binding ligands. For that purpose a data collection of known mGluR binding ligands should be established and this reference collection analyzed with respect to different ligand activity classes, NAM or PAM and selective modulators. The prediction of novel NAMs and PAMs using several combinations of 2D-, 3D-, pharmacophore or molecule shape encoding methods with machine learning techniques and similarity determining methods should be tested in a prospective manner (Section 4.2 Virtual screening for novel mGluR modulators). In collaboration with Merz Pharmaceuticals (Merz GmbH & Co. KGaA, Frankfurt am Main, Germany) the modulating effect of a few hundred molecules should be approved in a functional cell-based assay. With the objective to predict a binding mode of the discovered active molecules, molecule docking should be applied using the allosteric binding site of the modeled mGluR5 structure (Section 4.2.4 Modeling of binding modes). Predicted ligand binding modes are to be correlated to conservation profiles that had resulted from the sequence-based entropy analysis and information from mutation experiments, and shall be compared to known ligand binding poses from crystal structures of family A GPCRs.Im Rahmen dieser Arbeit wurden Konzepte zur Aufklärung struktureller und funktioneller Eigenschaften von G-Protein gekoppelten Rezeptoren (GPCR) der Familie C entwickelt und angewendet. Mit unterschiedlichen Methodiken der Bio- und Chemieinformatik orientiert an experimentellen Ergebnissen wurden Fragestellungen bezüglich des Funktionsmechanismus von GPCRs untersucht. In Verlauf wurde anhand verfügbarer experimenteller Daten aus Mutations- und Ligandenbindungsstudien ein Vergleich konservierter Bereiche der Rezeptor-Familien A und C angefertigt. Die Konserviertheitsanalyse stützte sich auf die Berechnung der Shannon-Entropie und wurde für ein multiples Sequenzalignment von Transmembrandomänen unterschiedlicher 96 Familie C GPCRs ermittelt. Konservierte Bereiche wurden mit Hilfe experimenteller Daten interpretiert und insbesondere zur Definition von Regionen in der allosterischen Bindetasche hinsichtlich Selektivität verwendet. Mit dem Ziel, neue selektive allosterische Modulatoren für den metabotropen Glutamatrezeptor des Typs fünf (mGluR5) zu finden, wurden mehrere Liganden-basierte Ansätze zur virtuellen Vorhersage der Aktivität von Molekülen entwickelt und getestet. Die dabei angewendete Strategie basierte auf der Kenntnis bereits bekannter Liganden, deren Strukturen und Aktivitätswerte für das Erstellen von Vorhersagemodelle genutzt werden konnten. Die prospektive Vorhersage stützte sich auf unterschiedliche Methoden zur Ähnlichkeitsberechnung und Arten der Molekülkodierung. Die Testung der Moleküle erfolgte hinsichtlich ihrer modulatorischen Wirkung am mGluR5. Die Art der Messung erfasste die Änderungen des Ca2+-Levels in der Zelle. mGluR5-bindende Modulatoren wurden zur Selektivitätsbestimmung einer Testung am mGluR1 unterzogen. Insgesamt konnten 8 von 228 getesteten Molekülen im Aktivitätsbereich unter 10μM ermittelt werden, darunter befand sich ein positiver allosterischer Modulator. Von den restlichen sieben negativen Modulatoren (NAM) waren fünf selektiv für mGluR5. Alle identifizierten NAMs wurden mittels molekularem Dockings auf mögliche Interaktion mit der Transmembrandomäne von mGluR5 untersucht. Die Bindungshypothese entsprach einer Überlagerung der gefundenen Moleküle und ihrer möglicher Interaktionspunkte. Exemplarisch am mGluR5 konnte somit die Eignung einer modellierten GPCR-Struktur für eine Hypothesengenerierung bezüglich Ligandenbindung und struktureller Zusammenhänge untersucht werden

    Räumliche Statistik zur Analyse Chemischer Datensätze zur Validierung von Techniken des Virtuellen Screenings

    Get PDF
    A common finding of many reports evaluating virtual screening methods is that validation results vary considerably with changing benchmark datasets. It is widely assumed that these effects are caused by the redundancy and cluster structure inherent to those datasets. These phenomena manifest themselves in descriptor space, which is termed the dataset topology. A methodology for the characterization of dataset topology based on spatial statistics is introduced. With this methodology it is possible to associate differences in virtual screening performance on different datasets with differences in dataset topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark datasets by a more favorable topology. It is shown, that the composition of some benchmark datasets causes topologies that lead to over-optimistic validation results even in very "simple" descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased datasets and provides a tool for the design of unbiased benchmark datasets. General principles for the design of benchmark datasets, which are not affected by topological bias, were developed. Refined Nearest Neighbor Analysis was used to design benchmark datasets based on PubChem bioactivity data. A workflow is devised that purges datasets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies was applied to generate corresponding datasets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These datasets provide a tool for an Maximum Unbiased Validation (MUV) of virtual screening methods. The datasets and a MATLAB toolbox for spatial statistics are freely available on the enclosed CD-ROM or via the internet at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.Ein Ergebnis vieler Arbeiten zur Validierung von Methoden des Virtuellen Screenings ist, dass die Ergebnisse stark von den Validierdatensätzen abhängen. Es wird angenommen, dass diese Effekte durch die Redundanz und Clusterstruktur der Datensätze verursacht werden. Die Abbildung eines Datensatzes im Deskriptorraum, die ``Datensatztopologie'' , spiegelt diese Phänomene wider. Im Rahmen der Arbeit wird eine Methode aus dem Bereich der räumlichen Statistik zur Charakterisierung der Datensatztopologie eingeführt. Mit dieser Methode ist es möglich, Unterschiede in den Ergebnissen von Validierexperimenten mit Unterschieden in der Datensatztopologie zu erklären. Darüberhinaus kann das bessere Abschneiden einiger Deskriptoren mit deren Fähigkeit erklärt werden, günstigere Topologien zu erzeugen. Die Zusammensetzung mancher Validierdatensätze bedingt Topologien, die zu überoptimistischen Validierergebnissen führen. Die vorgestellte Methodik ermöglicht es, solche Datensätze vor der Validierung zu erkennen. Weiterhin kann die Methode verwendet werden, um zielgerichtet Datensätze zu konstruieren, die unverfälschte Validierergebnisse sicherstellen. Auf diesen Ergebnissen aufbauend werden generelle Kriterien für die Konstruktion von Validierdatensätzen entwickelt. Mit Hilfe von Methoden der ``Refined Nearest Neighbor Analysis” werden verzerrungsfreie Datesätze generiert. Als Basis dienen Datensätze von Substanzen mit Bioaktivität aus PubChem. Ein neu entwickeltes Verfahren ermöglicht es, Substanzen mit unspezifischer Bioaktivität aus diesen Datensätzen zu entfernen. Durch Optimierung der Datensatztopologie werden korrespondierende Datensätze von Aktiven und Inaktiven erstellt, die eine Maximal Unverfälschte Validierung (MUV) von Techniken des Virtuellen Screenings ermöglichen. Diese Datensätze und eine MATLAB Toolbox für räumliche Statistik sind auf der beiliegenden CD-ROM oder im Internet unter http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html frei verfügbar

    Rethinking drug design in the artificial intelligence era

    Get PDF
    Artificial intelligence (AI) tools are increasingly being applied in drug discovery. While some protagonists point to vast opportunities potentially offered by such tools, others remain sceptical, waiting for a clear impact to be shown in drug discovery projects. The reality is probably somewhere in-between these extremes, yet it is clear that AI is providing new challenges not only for the scientists involved but also for the biopharma industry and its established processes for discovering and developing new medicines. This article presents the views of a diverse group of international experts on the 'grand challenges' in small-molecule drug discovery with AI and the approaches to address them
    corecore