59 research outputs found

    2D-Qsar for 450 types of amino acid induction peptides with a novel substructure pair descriptor having wider scope

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quantitative structure-activity relationships (QSAR) analysis of peptides is helpful for designing various types of drugs such as kinase inhibitor or antigen. Capturing various properties of peptides is essential for analyzing two-dimensional QSAR. A descriptor of peptides is an important element for capturing properties. The atom pair holographic (APH) code is designed for the description of peptides and it represents peptides as the combination of thirty-six types of key atoms and their intermediate binding between two key atoms.</p> <p>Results</p> <p>The substructure pair descriptor (SPAD) represents peptides as the combination of forty-nine types of key substructures and the sequence of amino acid residues between two substructures. The size of the key substructures is larger and the length of the sequence is longer than traditional descriptors. Similarity searches on C5a inhibitor data set and kinase inhibitor data set showed that order of inhibitors become three times higher by representing peptides with SPAD, respectively. Comparing scope of each descriptor shows that SPAD captures different properties from APH.</p> <p>Conclusion</p> <p>QSAR/QSPR for peptides is helpful for designing various types of drugs such as kinase inhibitor and antigen. SPAD is a novel and powerful descriptor for various types of peptides. Accuracy of QSAR/QSPR becomes higher by describing peptides with SPAD.</p

    Navigating the chemical space of dipeptidyl peptidase-4 inhibitors

    Get PDF

    Classification and interpretation in quantitative structure-activity relationships

    Get PDF
    A good QSAR model comprises several components. Predictive accuracy is paramount, but it is not the only important aspect. In addition, one should apply robust and appropriate statistical tests to the models to assess their significance or the significance of any apparent improvements. The real impact of a QSAR, however, perhaps lies in its chemical insight and interpretation, an aspect which is often overlooked. This thesis covers three main topics: a comparison of contemporary classifiers, interpretability of random forests and usage of interpretable descriptors. The selection of data mining technique and descriptors entirely determine the available interpretation. Using interpretable approaches we have demonstrated their success on a variety of data sets. By using robust multiple comparison statistics with eight data sets we demonstrate that a random forest has comparable predictive accuracies to the de facto standard, support vector machine. A random forest is inherently more interpretable than support vector machine, due to the underlying tree construction. We can extract some chemical insight from the random forest. However, with additional tools further insight would be available. A decision tree is easier to interpret than a random forest. Therefore, to obtain useful interpretation from a random forest we have employed a selection of tools. This includes alternative representations of the trees using SMILES and SMARTS. Using existing methods we can compare and cluster the trees in this representation. Descriptor analysis and importance can be measured at the tree and forest level. Pathways in the trees can be compared and frequently occurring subgraphs identified. These tools have been built around the Weka machine learning workbench and are designed to allow further additions of new functionality. The interpretability of a model is dependent on the model and the descriptors. They must describe something meaningful. To this end we have used the TMACC descriptors in the Solubility Challenge and literature data sets. We report how our retrospective analysis confirms existing knowledge and how we identify novel C-domain inhibition of ACE. In order to test our hypotheses we extended and developed existing software forming two applications. The Nottingham Cheminformatics Workbench (NCW) will generate TMACC descriptors and allows the user to build and analyse models, including visualising the chemical interpretation. Forest Based Interpretation (FBI) provides various tools for interpretating a random forest model. Both applications are written in Java with full documentation and simple installations wizards are available for Windows, Linux and Mac

    Classification and interpretation in quantitative structure-activity relationships

    Get PDF
    A good QSAR model comprises several components. Predictive accuracy is paramount, but it is not the only important aspect. In addition, one should apply robust and appropriate statistical tests to the models to assess their significance or the significance of any apparent improvements. The real impact of a QSAR, however, perhaps lies in its chemical insight and interpretation, an aspect which is often overlooked. This thesis covers three main topics: a comparison of contemporary classifiers, interpretability of random forests and usage of interpretable descriptors. The selection of data mining technique and descriptors entirely determine the available interpretation. Using interpretable approaches we have demonstrated their success on a variety of data sets. By using robust multiple comparison statistics with eight data sets we demonstrate that a random forest has comparable predictive accuracies to the de facto standard, support vector machine. A random forest is inherently more interpretable than support vector machine, due to the underlying tree construction. We can extract some chemical insight from the random forest. However, with additional tools further insight would be available. A decision tree is easier to interpret than a random forest. Therefore, to obtain useful interpretation from a random forest we have employed a selection of tools. This includes alternative representations of the trees using SMILES and SMARTS. Using existing methods we can compare and cluster the trees in this representation. Descriptor analysis and importance can be measured at the tree and forest level. Pathways in the trees can be compared and frequently occurring subgraphs identified. These tools have been built around the Weka machine learning workbench and are designed to allow further additions of new functionality. The interpretability of a model is dependent on the model and the descriptors. They must describe something meaningful. To this end we have used the TMACC descriptors in the Solubility Challenge and literature data sets. We report how our retrospective analysis confirms existing knowledge and how we identify novel C-domain inhibition of ACE. In order to test our hypotheses we extended and developed existing software forming two applications. The Nottingham Cheminformatics Workbench (NCW) will generate TMACC descriptors and allows the user to build and analyse models, including visualising the chemical interpretation. Forest Based Interpretation (FBI) provides various tools for interpretating a random forest model. Both applications are written in Java with full documentation and simple installations wizards are available for Windows, Linux and Mac

    Identification of structure activity relationships in primary screening data of high-throughput screening assays

    Get PDF
    The aim of the thesis was to identify structure activity relationships (SAR) in the primary screening data of high-throughput screening (HTS) assays. The strategy was to perform a hierarchical clustering of the molecules, assign the primary screening data to the created clusters and derive models from the clusters. The models should serve to identify singletons, clusters enriched with actives, not confirmed hits and false-negatives. Two hierarchical clustering algorithms, NIPALSTREE and hierarchical k-means have been developed and adapted for this purpose, respectively. A graphical user interface (GUI) has been implemented to extract SAR from the clustering results. Retrospective and prospective applications of the clustering approach were performed. SAR models were created by combining the clustering results with different chemoinformatic methods. NIPALSTREE projects a data set onto one dimension using principle component analysis. The data set is sorted according to the scoring vector and split at the median position into two subsets. The algorithm is applied recursively onto the subsets. The hierarchical k-means recursively separates a data set into two clusters using the k-means algorithm. Both algorithms are capable of clustering large data sets with more than a million data points. They were validated and compared to each other on the basis of different structural classes. NIPALSTREE provided with the loading vectors first insights into SAR whereas the hierarchical k-means yielded superior results. A GUI was developed allowing the display of and the navigation in the clustering results. Functionalities were integrated to analyse the clusters in the dendrogram, molecules in a cluster, and physicochemical properties of a molecule. Measures were developed to identify clusters enriched with actives, to characterize singletons and to analyse selectivity and specificity. Different protease inhibitors of the COBRA database were examined using the hierarchical k-means algorithm. Supported by similarity searches and nearest neighbour analyses thrombin inhibitor singletons were quickly isolated and displayed in the dendrogram. By scaling enrichment factors to the logarithm of the dendrogram level, clusters enriched with different structural classes of factor Xa inhibitors were simultaneously identified. The observed co-clustering of other protease inhibitors provided a deeper insight into selectivity and specificity and shows the utility of the approach for constructing focussed screening libraries. Specificity was analyzed by extracting and clustering relative frequencies of the protease inhibitors from the clusters of dendrogram level 7. A unique ligand based point of view on the pocketome of the protease enzymes was obtained. To identify not confirmed hits and false-negatives in the primary screening data of HTS assays, three assays were retrospectively analysed with the hierarchical k-means algorithm. A rule catalogue was developed judging hits in terminal clusters based on the cluster size, the percent control values of the entries in a cluster, the overall hit rate, the hit rate in the cluster and the environment of a cluster in the dendrogram. It resulted in the identification of a high proportion of not confirmed hits and provided for each hit a rating in context of related non-hits. This allows prioritizing compounds for follow-up studies. Non-hits and hits were retrieved from terminal clusters containing hits. Molecules bearing false-negative scaffolds were co-extracted and enriched. To minimize the number of false-positives in the extracted lists, Bayesian regularized artificial neutral network classification models were trained with the data. Applying the models marked improvement of enrichment factors for the false-negatives was obtained. It proofs the scaffold-hopping potential of the approach. NIPALSTREE, the hierarchical k-means algorithm and self-organising maps were prospectively applied to identify novel lead candidates for dopamine D3 receptors. Compounds with novel scaffolds and low nanomolar binding affinity (65 nM, compound 42) were identified. To provide a deeper insight into the SAR of these molecules, different alternative computational methods were employed. Support vector-based regression and partial least squares were examined. Predictive models for dopamine D2 and D3 receptor binding affinity values were obtained. Important features explaining SAR were extracted from the models. The prospective application of the models to the diverse and novel virtual screening data was of limited success only. Docking studies were performed using a homology model of the dopamine D3 receptor. The visual inspection of the binding modes resulted in the hypothesis of two alternative binding pockets for the aryl moiety of dopamine D3 receptor antagonists. A pharmacophore model was created simultaneously requiring both aryl moieties. Virtual screening with the model identified a nanomolar hit (65 nM, compound 59) corroborating the hypothesis of the two binding pockets and providing a new lead structure for dopamine D3 receptors. The presented data shows that the combined approach of hierarchically clustering a data set in combination with the subsequent usage of the clusters for model generation is suited to extract SAR from screening data. The models are successful in identifying singletons, clusters enriched with actives, not confirmed hits and false-negative scaffolds.Das Ziel der Arbeit war es, Struktur-Aktivitätsbeziehungen (SAR) in primären Screeningdaten von Hochdurchsatzscreening (HTS)- Assays zu finden. Als Strategie sollten die Moleküle hierarchisch geclustert werden, die primären Screeningdaten den gebildeten Clustern zugeordnet und Modelle aus den Clustern abgeleitet werden. Die Modelle sollten das Auffinden von Singletons, mit Hits angereicherter Cluster, nicht bestätigter Hits und falsch Negativer ermöglichen. Zu diesem Zweck wurden zwei hierarchische Clusteralgorithmen, NIPALSTREE und hierarchischer k-means, entwickelt bzw. angepasst. Eine graphische Benutzeroberfläche (GUI) wurde implementiert, um SAR aus den Ergebnissen der Clusterung abzuleiten. Retrospektive und prospektive Anwendungen wurden mit den Clusteransätzen verfolgt. SAR Modelle wurden durch Verwendung der Ergebnisse der Clusterung mit verschiedenen chemoinformatischen Verfahren erstellt. NIPALSTREE projiziert mit Hilfe der Hauptkomponentenanalyse einen Datensatz auf eine Dimension. Der Datensatz wird anhand des Scoringvektors sortiert und, basierend auf dem Median, in zwei Teilmengen aufgetrennt. Der Algorithmus wird rekursiv auf die neu gebildeten Mengen angewandt. Der hierarchische k-means Algorithmus trennt, basierend auf dem k-means Algorithmus, einen Datensatz rekursiv in zwei Cluster auf. Beide Algorithmen sind in der Lage, große Datenmengen mit mehr als einer Million Datenpunkte zu clustern. Sie wurden anhand verschiedener Strukturklassen validiert und miteinander verglichen. NIPALSTREE erbrachte mit dem Loadingvektor erste Einblicke in die SAR, wohingegen der hierarchische k-means zu besseren Ergebnissen führte. Eine GUI wurde entwickelt, die es erlaubt, die Clusterergebnisse darzustellen und darin zu navigieren. Funktionalitäten wurden bereitgestellt, um die Cluster im Dendrogramm, die Moleküle eines Clusters und die physikochemischen Eigenschaften eines Moleküls zu analysieren. Verfahren wurden entwickelt, um mit Hits angereicherte Cluster zu finden, Singletons zu charakterisieren und Selektivität und Spezifität zu analysieren. Verschiedene Proteaseinhibitoren aus der COBRA-Datenbank wurden mit dem hierarchischen k-means Algorithmus näher betrachtet. Mit Hilfe von Ähnlichkeitssuchen und nächsten Nachbaranalysen wurden Thrombininhibitorsingletons im Dendrogram in kürzester Zeit isoliert und dargestellt. Cluster, die mit verschiedenen Strukturklassen von Faktor-Xa-Inhibitoren angereichert waren, wurden, durch Skalierung des Anreicherungsfaktors auf den Logarithmus der Dendrogrammebene, gleichzeitig im Dendrogramm identifiziert. Eine Clusterung der Faktor-Xa-Inhibitoren mit anderen Proteaseinhibitoren wurde beobachtet. Sie erbrachte einen vertieften Einblick in Selektivität und Spezifität und zeigt die Anwendbarkeit des Ansatzes zur Erstellung fokussierter Screeningbibliotheken. Durch Extrahierung und Clusterung der relativen Anteile der Proteaseinhibitoren aus den Clustern von Dendrogrammebene sieben wurde die Spezifität der Proteaseinhibitoren analysiert. Eine spezifische, Liganden basierte Betrachtung des Pocketoms der Proteaseenzyme wurde erhalten. Um nicht bestätigte Hits und falsch Negative in den primären Screening Daten von HTS Assays zu finden, wurden drei Assays in Retrospektive mit dem hierarchischen k-means analysiert. Ein Regelwerk wurde entwickelt, welches Hits anhand der Clustergröße, des Prozent-Kontrollwertes der Einträge eines Clusters, der Gesamthitrate, der Hitrate in einem Cluster und der Umgebung des Clusters im Dendrogramm bewertet. Das Regelwerk führte zum Auffindung eines großen Anteils nicht bestätigter Hits. Zudem wurde für jeden Hit eine Bewertung im Kontext verwandter Nichthits erhalten. Dies erlaubt ein Priorisieren von Molekülen für Folgeuntersuchungen. Nichthits und Hits wurden aus Endcluster, die Hits enthielten, extrahiert. Moleküle mit falsch negativen Molekülgrundgerüsten wurden koextrahiert und angereichert. Um falsch Positive in den extrahierten Listen zu minimieren, wurden Bayesische regularisierte neuronale Klassifizierungsnetze mit den Daten trainiert. Die Anwendung der Modelle ergab eine deutliche Verbesserung der Anreicherungsfaktoren der falsch Negativen. Es zeigt, dass die Methode in der Lage ist, einen Molekülgrundgerüstwechsel durchzuführen. NIPALSTREE, der hierarchische k-means und selbst organisierende Karten wurden prospektiv angewandt, um neue Leitstrukturkandidaten für Dopamin-D3-Rezeptoren zu finden. Moleküle mit neuen Molekülgrundgerüsten und Bindungsaffinitäten im niedrigen nanomolaren Bereich wurden gefunden (65 nM für Molekül 42). Um einen tieferen Einblick in die SAR dieser Moleküle zu erhalten, wurden verschiede Computerverfahren verwendet. Supportvektorregression und PLS („partial least squares“) wurden untersucht. Es war möglich, voraussagende Modelle für Dopamin-D2 und D3 Bindungsaffinitäten zu erstellen. Die SAR erklärende Moleküleigenschaften konnten aus den Modellen extrahiert werden. Die prospektive Anwendung der Modelle auf die diversen und neuen virtuellen Screeningdaten war nur von begrenztem Erfolg. Dockingstudien wurden mit einem Homologiemodell des Dopamin-D3-Rezeptors durchgeführt. Die visuelle Begutachtung der Bindemoden führte zur Hypothese zweier alternativer Bindetaschen für den Aryl-Rest von Dopamin-D3-Rezeptorantagonisten. Ein Pharmakophormodell wurde erstellt, welches beide Aryl-Reste gleichzeitig benötigt. Ein virtuelles Screening mit dem Modell identifizierte einen nanomolaren Hit (65 nM für Molekül 59), welcher die Hypothese unterstützt und eine neue Leitstruktur für Dopamin-D3-Rezeptoren darstellt. Die vorgestellten Daten zeigen, dass der kombinierte Ansatz aus hierarchischer Clusterung und anschließender Verwendung der Cluster zur Modellerstellung, SAR in HTS-Daten findet. Die Modelle sind geeignet zum Auffinden von Singletons, mit Hits angereichter Cluster, nicht bestätigter Hits und falsch negativer Molekülgrundgerüste

    Development of in silico models for the prediction of toxicity incorporating ADME information

    Get PDF
    Drug discovery is a process that requires a significant investment in both time and resources. Although recent developments have reduced the number of drugs failing at the later stages of development due to poor pharmacokinetic and/or toxicokinetic profiles, late stage attrition of drug candidates remains a problem. Additionally, there is a need to reduce animal testing for toxicological risk assessment for ethical and financial reasons. In silico methods offer an alternative that can address these challenges. A variety of computational approaches have been developed in the last two decades, these must be evaluated to ensure confidence in their use. The research presented in this thesis has assessed a range of existing tools for the prediction of toxicity and absorption, distribution, metabolism and elimination (ADME) parameters with an emphasis on absorption and xenobiotic metabolism. These two ADME properties largely determine bioavailability of a drug and, in turn, also influence toxicity. In vitro (Caco-2 cells and the parallel artificial membrane permeation assay) and in silico approaches, such as various druglikeness filters, can be used to estimate human intestinal absorption; a comparison between different methods was performed to identify relative strengths and weaknesses of the approaches. In terms of xenobiotic metabolism it is not only important to predict metabolites correctly, but it is also crucial to identify those compounds that can be biotransformed into species that can covalently bind to biomolecules. Structural alerts are routinely used to screen for such potential reactive metabolites. The balance between sensitivity and specificity of such reactive metabolite alerts has been discussed in the context of correctly predicting reactive metabolites of pharmaceuticals (using data available from DrugBank). Off-target toxicity, exemplified by human Ether-à-go-go-Related Gene (hERG) channel inhibition, was also explored. A number of novel structural alerts for hERG toxicity were developed based on groups of structurally similar compounds. Finally, the importance of predicting potential ecotoxicological effects of drugs was also considered. The utility of zebrafish embryos to distinguish between baseline and excess toxicity was investigated. In evaluating this selection of existing tools, improvements to the methods have been proposed where possible

    IN SILICO METHODS FOR DRUG DESIGN AND DISCOVERY

    Get PDF
    Computer-aided drug design (CADD) methodologies are playing an ever-increasing role in drug discovery that are critical in the cost-effective identification of promising drug candidates. These computational methods are relevant in limiting the use of animal models in pharmacological research, for aiding the rational design of novel and safe drug candidates, and for repositioning marketed drugs, supporting medicinal chemists and pharmacologists during the drug discovery trajectory.Within this field of research, we launched a Research Topic in Frontiers in Chemistry in March 2019 entitled “In silico Methods for Drug Design and Discovery,” which involved two sections of the journal: Medicinal and Pharmaceutical Chemistry and Theoretical and Computational Chemistry. For the reasons mentioned, this Research Topic attracted the attention of scientists and received a large number of submitted manuscripts. Among them 27 Original Research articles, five Review articles, and two Perspective articles have been published within the Research Topic. The Original Research articles cover most of the topics in CADD, reporting advanced in silico methods in drug discovery, while the Review articles offer a point of view of some computer-driven techniques applied to drug research. Finally, the Perspective articles provide a vision of specific computational approaches with an outlook in the modern era of CADD

    Study and Design of Kynurenine Aminotransferase-II Inhibitors for the Treatment of Neurological Conditions

    Get PDF
    The majority of tryptophan metabolism passes through the kynurenine pathway. Metabolic imbalances in this pathway are implicated disease. KYNA, transaminated by the kynurenine aminotransferase (KAT) enzymes, is elevated in patients with schizophrenia. Schizophrenia is a neuropsychiatric disease with limited treatment options and debilitating symptoms. Glutamatergic systems are thought to have a significant role in its pathogenesis, providing a basis by which KYNA, an endogenous glutamate antagonist, is implicated in the disease. Four pyridoxal 5’-phosphate-dependent homologues of KAT are reported. KAT-II is primarily responsible for KYNA production in the human brain. KAT-II inhibitors reduce KYNA production, increase neurotransmitter release and elicit pro-cognitive effects, indicative of their potential as novel therapies in treating schizophrenia. In this work, surface plasmon resonance has been employed to screen a fragment library, from which two fragments, F6037-0164 and F0037-7280 were pursued (IC50 of 524.5 μM and 115.2 μM, respectively). Another strategy was to consider estrogen compounds as schizophrenia is a sexually dimorphic condition, in which female patients have reduced estrogen levels. Enzyme inhibitory assays displayed estradiol disulfate as a strong inhibitor of KAT-I and KAT-II (IC50: 291.5 μM and 26.3 μM, respectively), with estradiol, estradiol 3-sulfate and estrone sulfate inhibiting weakly. Molecular modelling suggests that the 17-sulfate moiety in estradiol disulfate improves its potency by 10-100 fold compared to estradiol. This 17-sulfate moiety was mimicked on existing KAT-II inhibitor scaffolds to develop two novel inhibitors, JN-01 and JN-02, with improved potencies (IC50: 73.8 μM and 112.8 μM, respectively). Co-crystallisation studies resulted in the determination of a human KAT-II crystal structure (PDB ID: 6D0A) with 1.47 Å resolution, the highest resolution structure provided for KAT-II, with the least structural inconsistencies

    Computational approaches in supramolecular chemistry with a special focus on virtual screening

    Get PDF
    Within this thesis novel computational tools for the rational design of synthetic host-guest complexes (SHGC) were developed and applied that employ the concepts of efficient virtual screening (VS) approaches. The first part describes the development of a fast structure prediction tool for flexible SHGC. The tool was validated on a test dataset comprising crystallographically determined SHGC. In nine of ten cases near-native solutions were generated. The tool can be applied for VS. In the second part of the thesis computational techniques were applied for designing SHGC based on ß-cyclodextrins (ß-CD). We performed a structure-based inverse virtual screening for identifying modified ß-CDs as receptors for the anticancer drug camptothecin (CPT). Six of the proposed receptors exhibited binding affinities which were significantly higher than for any other CPT-receptor. Furthermore, we applied a combination of a similarity-based virtual screening technique with a regression model (RM) for identifying novel high affinity guest molecules of ß-CD. Ten of the proposed guest molecules exhibited a binding free energy of lower than -20 kJ/mol. The last chapter describes a comparison of regression methods regarding their ability to generate predictive RM for thermodynamical parameters (dG, dH and dS) of ß-CD-guest complexes. dG could be predicted in good agreement with experimental values, none of the methods led to comparably good predictive models for dH. dS appears almost unpredictable.Im Rahmen dieser Arbeit wurden rechnergestützte Verfahren (RGV) zum gezielten Entwurf von synthetischen Wirt-Gast Komplexen (SWGK) entwickelt und eingesetzt. Dabei wurde ein Fokus auf schnelle virtuelle Screening (VS) Verfahren gelegt. Der erste Teil beschreibt die Entwicklung eines Programms zur schnellen Strukturvorhersage von flexiblen SWGK. Das Programm wurde auf einem Testdatensatz an kristallographisch vermessenen SWGK validiert. Für neun von zehn SWGK wurden nativ-ähnliche Lösungen gefunden. Das Programm kann für VS eingesetzt werden. Der zweite Teil der Arbeit behandelt RGV zum gezielten Entwurf von ß-Cyclodextrin (ß-CD) Komplexen. Mit Hilfe eines strukturbasierten inversen VS wurden sechs modifizierte ß-CD-Rezeptoren für den Krebsarzneistoff Camptothecin (CPT) gefunden, die deutlich höhere Bindungsaffinitäten zu CPT aufwiesen als alle bislang bekannten CPT-Rezeptoren. Zur Identifizierung neuer hochaffiner Gäste von ß-CD wurde ein ähnlichkeitsbasiertes VS Verfahren in Kombination mit einem Regressionsmodell (RM) eingesetzt. Zehn der mit Hilfe dieses Verfahrens vorgeschlagenen Moleküle wiesen eine Bindungsenergie von unter -20 kJ/mol auf. Das letzte Kapitel beschreibt einen Vergleich von drei Regressionsverfahren. Es wurde die Fähigkeit untersucht, vorhersagekräftige RM für thermodynamische Parameter (dG, dH und dS) von ß-CD-Gast-Komplexen zu generieren. dG konnte mit allen Methoden sehr gut vorhergesagt werden, während dH nur begrenzt und dS unzureichend vorhersagbar war
    corecore