10 research outputs found

    Application and Development of Computational Methods for Ligand-Based Virtual Screening

    Get PDF
    The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified

    Critical comparison of virtual screening methods against the MUV data set

    No full text
    In the current work, we measure the performance of seven ligand-based virtual screening tools - five similarity search methods and two pharmacophore elucidators - against the MUV data set. For the similarity search tools, single active molecules as well as active compound sets clustered in terms of their chemical diversity were used as templates., Their score was calculated against all inactive and active compounds in their target class. Subsequently, the scores were used to calculate different performance metrics in eluding enrichment factors and AUC values. We also studied the effect of data fusion on the results. To measure the performance of the pharmacophore tools, a set of active molecules was picked either random- or chemical diversity-based from each target class to build a pharmacophore model which was then used to screen the remaining compounds in the set. Our results indicate that template sets selected by their chemical diversity are the best choice for similarity search tools, whereas the optimal training sets for pharmacophore elucidators are based on random selection underscoring that pharmacophore modeling cannot be easily automated. We also suggest a number of improvements for future benchmark sets and discuss activity cliffs as a potential problem in ligand-based virtual screening

    Machine Learning Methodologies for Interpretable Compound Activity Predictions

    Get PDF
    Machine learning (ML) models have gained attention for mining the pharmaceutical data that are currently generated at unprecedented rates and potentially accelerate the discovery of new drugs. The advent of deep learning (DL) has also raised expectations in pharmaceutical research. A central task in drug discovery is the initial search of compounds with desired biological activity. ML algorithms are able to find patterns in compound structures that are related to bioactivity, the so-called structure-activity relationships (SARs). ML-based predictions can complement biological testing to prioritize further experiments. Moreover, insights into model decisions are highly desired for further validation and identification of activity-relevant substructures. However, the interpretation of complex ML models remains essentially prohibitive. This thesis focuses on ML-based predictions of compound activity against multiple biological targets. Single-target and multi-target models are generated for relevant tasks including the prediction of profiling matrices from screening data and the discrimination between weak and strong inhibitors for more than a hundred kinases. Moreover, the relative performance of distinct modeling strategies is systematically analyzed under varying training conditions, and practical guidelines are reported. Since explainable model decisions are a clear requirement for the utility of ML bioactivity models in pharmaceutical research, methods for the interpretation and intuitive visualization of activity predictions from any ML or DL model are introduced. Taken together, this dissertation presents contributions that advance in the application and rationalization of ML models for biological activity and SAR predictions

    CavKA – Cavity Knowledge Acceleration: Development of a new structure-based method for pharmacophore elucidatio

    Get PDF
    Pharmakophor-Suchen sind eine wichtige Technik im Virtuellen Screening, die aufgrund ihres hohen Grades an Abstraktion unterschiedliche Chemotypen ähnlicher Wirkstoffe auffinden können. Da die Zahl frei verfügbarer Röntgenkristallstrukturen stetig steigt, bietet der strukturbasierte Pharmakophor-Ansatz eine vielversprechende Alternative zu ligandbasierten Techniken als Grundlage für die Modell-Erstellung. Ziel dieser Arbeit war es, eine Methode zu implementieren und zu validieren, die basierend auf Ligand-Rezeptor-Komplexen Pharmakophor-Modelle ableitet. Die entwickelte CavKA-Methode verwendet geometrische Kriterien sowie konturierte Molekulare Interaktions-Felder (MIFs) des Programms GRID als eine energetische Komponente zur Modell-Erstellung. Es konnte gezeigt werden, dass mit diesem kombinierten Ansatz erstellte Modelle zuverlässig wichtige Schlüssel-Interaktionen (sogenannte Hotspots) zwischen Ligand und Bindetasche in ihrem Modell abbilden. CavKA wurde mit Ligandscout, der Standardtechnik zum Ableiten strukturbasierter Pharmakophore verglichen. CavKA-Modelle zeigten in einer retrospektiven Validierung in den meisten Fällen eine höhere Anreicherung aktiver Moleküle als mit Ligandscout erstellte Modelle. Die Verwendung von MIFs ist vorteilhaft, um essentielle interagierende Ligandeigenschaften und relevante Wassermoleküle identifizieren zu können, die an einer Ligand-Rezeptor-Interaktion beteiligt sind, um diese entsprechend im erstellten Modell zu berücksichtigen. Das schlechte Anreicherungsverhalten von Pharmakophor-Modellen für einige Zielstrukturen ließ sich vielfach auf ein verbesserungswürdiges Design des verwendeten Fieldscreen Validier-Datensatzes zurückführen. Des Weiteren sollte in dieser Arbeit analysiert werden, inwieweit Pharmakophor-Modelle ohne Einfluss des Anwenders erstellt werden können. Hierzu wurden zwei weitere Methoden implementiert. Die E-MIF-Methode nutzt nur MIF-Information und keine geometrischen Kriterien. Die CavKA-Hybrid-Methode kombiniert die Vorzüge ligandzentrischer Ansätze mit strukturbasierten Modellen. Im Rahmen dieser Analysen wurden ligandzentrische Ansätze unter der Fragestellung untersucht, mit welchen Methoden sich pharmakophore Eigenschaften idealerweise definieren lassen. Die besten Ergebnisse wurden mit regelbasierten Ansätzen erzielt. Etwas schlechter schnitten Methoden ab, die die Molekülgestalt verglichen. Methoden, die elektrostatische Potentiale zur Definition von Eigenschaften verwenden, waren beiden Methoden unterlegen. Spielen verschiedene Bindungsmodi unterschiedlicher Wirkstoffe an einer Zielstruktur eine Rolle, so konnte das Anreicherungsverhalten aktiver Moleküle durch eine neu entwickelte Technik des Parallelen Screenings (PLIDriPaS) verbessert werden. Zusammenfassend lässt sich sagen, dass die Kombination des geometrischen und energetischen CavKA Ansatzes sich der rein geometrischen Methode Ligandscout als klar überlegen erwies. Zur Definition pharmakophorer Eigenschaften eignen sich regelbasierte Ansätze am besten.Pharmacophore-searches are an important technique in virtual screening. Due to their abstract nature, pharmacophore models are known for their high scaffold-hopping potential. Since the amount of freely available x-ray structures constantly grows, a structure-based pharmacophore elucidation is a promising alternative to ligand-based techniques. The goal of this work was the implementation and validation of a method for structure-based pharmacophore elucidation based on ligand-receptor-complexes. The developed CavKA method employs geometrical criteria and contoured Molecular Interaction Fields (MIFs) of the GRID program as an energetic term for model creation. Models based on the combined approach reliably detected hotspots in the ligand-binding-site interface. Including MIF-information made it possible to detect essential ligand-receptor-interactions as well as water molecules mediating ligand-receptor-interactions which improved the pharmacphore model, if they were included. CavKA was compared to the well-established Ligandscout technique. In most cases those models that were derived by CavKA performed better in a retrospective validation study than those derived by Ligandscout. Low enrichments for some of the studied targets could be explained with the less than ideal design of the Fieldscreen benchmark dataset. Additionally, it was studied whether or not model building is possible without user intervention. Therefore, two alternative approaches were implemented. The E-MIF approach solely employed MIF-information but no geometrical rules. CavKA-Hybrid combined the advantages of ligandcentric- and structure-based pharmacophores. Pharmacophoric properties can be defined in a number of ways. The influence of the type of definition on the power of ligandcentric approaches was also studied here. It turned out that rule-based definitions performed better than shape-based approaches. Methods employing electrostatic potentials performed worse than the aforementioned approaches. If ligands bind in different modes to a target, enrichments could be improved by using PLIDriPaS, a new technique which employs so-called Parallel Screening. In summary, two general trends were observed for structure-based pharmacophore models in virtual screening: First, the CavKA method combining geometrical and energetic criteria demonstrated superior performance as compared to Ligandscout. Second, the best screening results were obtained by using rule-based definitions for pharmacophoric features
    corecore