194 research outputs found

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    Application and Development of Computational Methods for Ligand-Based Virtual Screening

    Get PDF
    The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified

    A prospective compound screening contest identified broader inhibitors for Sirtuin 1

    Get PDF
    Potential inhibitors of a target biomolecule, NAD-dependent deacetylase Sirtuin 1, were identified by a contest-based approach, in which participants were asked to propose a prioritized list of 400 compounds from a designated compound library containing 2.5 million compounds using in silico methods and scoring. Our aim was to identify target enzyme inhibitors and to benchmark computer-aided drug discovery methods under the same experimental conditions. Collecting compound lists derived from various methods is advantageous for aggregating compounds with structurally diversified properties compared with the use of a single method. The inhibitory action on Sirtuin 1 of approximately half of the proposed compounds was experimentally accessed. Ultimately, seven structurally diverse compounds were identified

    Virtual compound screening and SAR analysis: method development and practical applications in the design of new serine and cysteine protease inhibitors

    Get PDF
    Virtual screening is an important tool in drug discovery that uses different computational methods to screen chemical databases for the identification of possible drug candidates. Most virtual screening methodologies are knowledge driven where the availability of information on either the nature of the target binding pocket or the type of ligand that is expect to bind is essential. In this regard, the information contained in X-ray crystal structures of protein-ligand complexes provides a detailed insight into the interactions between the protein and the ligand and opens the opportunity for further understanding of drug action and structure activity relationships at molecular level. Protein-ligand interaction information can be utilized to introduce target-specific interaction-based constraints in the design of focused combinatorial libraries. It can also be directly transformed into structural interaction fingerprints and can be applied in virtual screening to analyze docking studies or filter compounds. However, the integration of protein-ligand interaction information into two-dimensional compound similarity searching is not fully explored. Therefore, novel methods are still required to efficiently utilize protein-ligand interaction information in two-dimensional ligand similarity searching. Furthermore, application of protein-ligand interaction information in the interpretation of SARs at the ligand level needs further exploration. Thus, utilization of three-dimensional protein ligand interaction information in virtual screening and SAR analysis was the major aim of this thesis. The thesis is presented in two major parts. In the first part, utilization of three-dimensional protein-ligand interaction information for the development of a new hybrid virtual screening method and analysis of the nature of SARs in analog series at molecular level is presented. The second part of the thesis is focused on the application of different virtual screening methods for the identification of new cysteine and membrane-bound serine proteases inhibitors. In addition, molecular modeling studies were also applied to analyze the binding mode of structurally complex cyclic peptide inhibitors

    From Knowledgebases to Toxicity Prediction and Promiscuity Assessment

    Get PDF
    Polypharmacology marked a paradigm shift in drug discovery from the traditional ‘one drug, one target’ approach to a multi-target perspective, indicating that highly effective drugs favorably modulate multiple biological targets. This ability of drugs to show activity towards many targets is referred to as promiscuity, an essential phenomenon that may as well lead to undesired side-effects. While activity at therapeutic targets provides desired biological response, toxicity often results from non-specific modulation of off-targets. Safety, efficacy and pharmacokinetics have been the primary concerns behind the failure of a majority of candidate drugs. Computer-based (in silico) models that can predict the pharmacological and toxicological profiles complement the ongoing efforts to lower the high attrition rates. High-confidence bioactivity data is a prerequisite for the development of robust in silico models. Additionally, data quality has been a key concern when integrating data from publicly-accessible bioactivity databases. A majority of the bioactivity data originates from high- throughput screening campaigns and medicinal chemistry literature. However, large numbers of screening hits are considered false-positives due to a number of reasons. In stark contrast, many compounds do not demonstrate biological activity despite being tested in hundreds of assays. This thesis work employs cheminformatics approaches to contribute to the aforementioned diverse, yet highly related, aspects that are crucial in rationalizing and expediting drug discovery. Knowledgebase resources of approved and withdrawn drugs were established and enriched with information integrated from multiple databases. These resources are not only useful in small molecule discovery and optimization, but also in the elucidation of mechanisms of action and off- target effects. In silico models were developed to predict the effects of small molecules on nuclear receptor and stress response pathways and human Ether-à-go-go-Related Gene encoded potassium channel. Chemical similarity and machine-learning based methods were evaluated while highlighting the challenges involved in the development of robust models using public domain bioactivity data. Furthermore, the true promiscuity of the potentially frequent hitter compounds was identified and their mechanisms of action were explored at the molecular level by investigating target-ligand complexes. Finally, the chemical and biological spaces of the extensively tested, yet inactive, compounds were investigated to reconfirm their potential to be promising candidates.Die Polypharmakologie beschreibt einen Paradigmenwechsel von "einem Wirkstoff - ein Zielmolekül" zu "einem Wirkstoff - viele Zielmoleküle" und zeigt zugleich auf, dass hochwirksame Medikamente nur durch die Interaktion mit mehreren Zielmolekülen Ihre komplette Wirkung entfalten können. Hierbei ist die biologische Aktivität eines Medikamentes direkt mit deren Nebenwirkungen assoziiert, was durch die Interaktion mit therapeutischen bzw. Off-Targets erklärt werden kann (Promiskuität). Ein Ungleichgewicht dieser Wechselwirkungen resultiert oftmals in mangelnder Wirksamkeit, Toxizität oder einer ungünstigen Pharmakokinetik, anhand dessen man das Scheitern mehrerer potentieller Wirkstoffe in ihrer präklinischen und klinischen Entwicklungsphase aufzeigen kann. Die frühzeitige Vorhersage des pharmakologischen und toxikologischen Profils durch computergestützte Modelle (in-silico) anhand der chemischen Struktur kann helfen den Prozess der Medikamentenentwicklung zu verbessern. Eine Voraussetzung für die erfolgreiche Vorhersage stellen zuverlässige Bioaktivitätsdaten dar. Allerdings ist die Datenqualität oftmals ein zentrales Problem bei der Datenintegration. Die Ursache hierfür ist die Verwendung von verschiedenen Bioassays und „Readouts“, deren Daten zum Großteil aus primären und bestätigenden Bioassays gewonnen werden. Während ein Großteil der Treffer aus primären Assays als falsch-positiv eingestuft werden, zeigen einige Substanzen keine biologische Aktivität, obwohl sie in beiden Assay- Typen ausgiebig getestet wurden (“extensively assayed compounds”). In diese Arbeit wurden verschiedene chemoinformatische Methoden entwickelt und angewandt, um die zuvor genannten Probleme zu thematisieren sowie Lösungsansätze aufzuzeigen und im Endeffekt die Arzneimittelforschung zu beschleunigen. Hierfür wurden nicht redundante, Hand-validierte Wissensdatenbanken für zugelassene und zurückgezogene Medikamente erstellt und mit weiterführenden Informationen angereichert, um die Entdeckung und Optimierung kleiner organischer Moleküle voran zu treiben. Ein entscheidendes Tool ist hierbei die Aufklärung derer Wirkmechanismen sowie Off-Target-Interaktionen. Für die weiterführende Charakterisierung von Nebenwirkungen, wurde ein Hauptaugenmerk auf Nuklearrezeptoren, Pathways in welchen Stressrezeptoren involviert sind sowie den hERG-Kanal gelegt und mit in-silico Modellen simuliert. Die Erstellung dieser Modelle wurden Mithilfe eines integrativen Ansatzes aus “state-of-the-art” Algorithmen wie Ähnlichkeitsvergleiche und “Machine- Learning” umgesetzt. Um ein hohes Maß an Vorhersagequalität zu gewährleisten, wurde bei der Evaluierung der Datensätze explizit auf die Datenqualität und deren chemische Vielfalt geachtet. Weiterführend wurden die in-silico-Modelle dahingehend erweitert, das Substrukturfilter genauer betrachtet wurden, um richtige Wirkmechanismen von unspezifischen Bindungsverhalten (falsch- positive Substanzen) zu unterscheiden. Abschließend wurden der chemische und biologische Raum ausgiebig getesteter, jedoch inaktiver, kleiner organischer Moleküle (“extensively assayed compounds”) untersucht und mit aktuell zugelassenen Medikamenten verglichen, um ihr Potenzial als vielversprechende Kandidaten zu bestätigen

    Computational Analysis of Structure-Activity Relationships : From Prediction to Visualization Methods

    Get PDF
    Understanding how structural modifications affect the biological activity of small molecules is one of the central themes in medicinal chemistry. By no means is structure-activity relationship (SAR) analysis a priori dependent on computational methods. However, as molecular data sets grow in size, we quickly approach our limits to access and compare structures and associated biological properties so that computational data processing and analysis often become essential. Here, different types of approaches of varying complexity for the analysis of SAR information are presented, which can be applied in the context of screening and chemical optimization projects. The first part of this thesis is dedicated to machine-learning strategies that aim at de novo ligand prediction and the preferential detection of potent hits in virtual screening. High emphasis is put on benchmarking of different strategies and a thorough evaluation of their utility in practical applications. However, an often claimed disadvantage of these prediction methods is their "black box" character because they do not necessarily reveal which structural features are associated with biological activity. Therefore, these methods are complemented by more descriptive SAR analysis approaches showing a higher degree of interpretability. Concepts from information theory are adapted to identify activity-relevant structure-derived descriptors. Furthermore, compound data mining methods exploring prespecified properties of available bioactive compounds on a large scale are designed to systematically relate molecular transformations to activity changes. Finally, these approaches are complemented by graphical methods that primarily help to access and visualize SAR data in congeneric series of compounds and allow the formulation of intuitive SAR rules applicable to the design of new compounds. The compendium of SAR analysis tools introduced in this thesis investigates SARs from different perspectives

    Unified processing framework of high-dimensional and overly imbalanced chemical datasets for virtual screening.

    Get PDF
    Virtual screening in drug discovery involves processing large datasets containing unknown molecules in order to find the ones that are likely to have the desired effects on a biological target, typically a protein receptor or an enzyme. Molecules are thereby classified into active or non-active in relation to the target. Misclassification of molecules in cases such as drug discovery and medical diagnosis is costly, both in time and finances. In the process of discovering a drug, it is mainly the inactive molecules classified as active towards the biological target i.e. false positives that cause a delay in the progress and high late-stage attrition. However, despite the pool of techniques available, the selection of the suitable approach in each situation is still a major challenge. This PhD thesis is designed to develop a pioneering framework which enables the analysis of the virtual screening of chemical compounds datasets in a wide range of settings in a unified fashion. The proposed method provides a better understanding of the dynamics of innovatively combining data processing and classification methods in order to screen massive, potentially high dimensional and overly imbalanced datasets more efficiently

    Exploring high thermal conductivity polymers via interpretable machine learning with physical descriptors

    Full text link
    The efficient and economical exploitation of polymers with high thermal conductivity is essential to solve the issue of heat dissipation in organic devices. Currently, the experimental preparation of functional thermal conductivity polymers remains a trial and error process due to the multi-degrees of freedom during the synthesis and characterization process. In this work, we have proposed a high-throughput screening framework for polymer chains with high thermal conductivity via interpretable machine learning and physical-feature engineering. The polymer thermal conductivity datasets for training were first collected by molecular dynamics simulation. Inspired by the drug-like small molecule representation and molecular force field, 320 polymer monomer descriptors were calculated and the 20 optimized descriptors with physical meaning were extracted by hierarchical down-selection. All the machine learning models achieve a prediction accuracy R2 greater than 0.80, which is superior to that of represented by traditional graph descriptors. Further, the cross-sectional area and dihedral stiffness descriptors were identified for positive/negative contribution to thermal conductivity, and 107 promising polymer structures with thermal conductivity greater than 20.00 W/mK were obtained. Mathematical formulas for predicting the polymer thermal conductivity were also constructed by using symbolic regression. The high thermal conductivity polymer structures are mostly {\pi}-conjugated, whose overlapping p-orbitals enable easily to maintain strong chain stiffness and large group velocities. The proposed data-driven framework should facilitate the theoretical and experimental design of polymers with desirable properties

    Modellierung von Metalloenzymen: 3D-QSAR-Untersuchungen an Carboanhydrase-Isoenzymen und virtuelles Screening nach Peptiddeformylase-Inhibitoren

    Get PDF
    In der modernen Arzneistoffentwicklung unterscheidet man die Phasen der Leitstrukturfindung und der Leitstrukturoptimierung. Die vorliegende Dissertationsschrift beinhaltet BeitrĂ€ge zu beiden Bereichen. Der erste Teil der Arbeit befasst sich mit der Entwicklung und Evaluierung von Computermodellen zur Vorhersage von AffinitĂ€t und SelektivitĂ€t und entstammt daher dem Bereich der Leitstrukturoptimierung. SelektivitĂ€tsaspekte spielen eine wichtige Rolle, da sie das Risiko von Nebenwirkungen maßgeblich beeinflussen. Zur Modellierung und Vorhersage von AffinitĂ€ts- und SelektivitĂ€tsparametern wurden QSAR-Methoden angewendet. Das Modellsystem stellten Carboanhydrasen (CAs) dar; diese zinkhaltigen Hydrolasen katalysieren die reversible Hydratisierung von Kohlendioxid zu Bicarbonat und einem Proton. Sie sind daher in eine Vielzahl (patho)physiologischer Prozesse involviert und stellen interessante therapeutische Targets dar. Die zahlreichen Isoenzyme der CAs besitzen im Bereich der Bindetasche hohe Ähnlichkeiten in Bezug auf physikochemische Eigenschaften, so dass die Entwicklung selektiver Inhibitoren kein triviales Problem darstellt. Im Mittelpunkt der Untersuchungen standen insbesondere 3D-QSAR-Verfahren. Es wurden statistisch hochsignifikante und robuste Modelle abgeleitet, um AffinitĂ€t und SelektivitĂ€t von Sulfonamidinhibitoren bezĂŒglich der Isoenzyme CA I, II und IV vorherzusagen. Es zeigte sich, dass die geringen Unterschiede im strukturbasierten Alignment unter Verwendung der drei Bindetaschen nur geringen Einfluss auf die statistischen Parameter besitzen und dass bessere Ergebnisse erzielt werden, wenn fĂŒr alle Isoenzyme das auf CA II basierende Alignment benutzt wird anstelle des Alignments in der jeweiligen Bindetasche. Ursache hierfĂŒr ist wahrscheinlich die Vielzahl an Kristallstrukturen, die fĂŒr CA II existieren und damit das Alignment verlĂ€sslicher machen. Die erhaltenen Isokonturkarten erlaubten eine Interpretation der Modelle im Hinblick auf die Bedeutung physikochemischer Eigenschaften fĂŒr die AffinitĂ€t/SelektivitĂ€t. Der Vergleich zu qualitativen proteinbasierten Isokonturkarten unterstreicht den komplementĂ€ren Charakter beider Methoden: WĂ€hrend die ligandbasierten QSAR-Verfahren implizit teilweise die Struktur der Bindetasche widerspiegeln, aber auch von den Eigenheiten des Trainingsdatensatzes abhĂ€ngen, vermögen die proteinbasierten Analysen auch Informationen ĂŒber Bereiche der Bindetasche zu geben, die keine Interaktionen mit Liganden des Trainingsdatensatzes ausbilden. Ein weiteres Ziel bestand darin, QSAR-Methoden fĂŒr das Screening grĂ¶ĂŸerer Datenbanken zu verwenden. Dies erlaubt die Identifizierung besonders interessanter (d.h. affiner/selektiver) Kandidaten zur Synthese im Sinne einer Leitstrukturoptimierung. FĂŒr 3D-QSAR-Methoden musste zunĂ€chst ein Protokoll zur Automatisierung des Alignments entwickelt und validiert werden. Es zeigte sich hierbei, dass ein ligandbasiertes Alignment vergleichbare Ergebnisse zu manuellen stukturbasierten Alignmentmethoden erzielt. Die 3D-Modelle erwiesen sich als ĂŒberlegen im Vergleich zu fragmentbasierten 2D-Methoden oder insbesondere zu den eigenschaftsbasierten 1D-Methoden. Als praktisches Anwendungsbeispiel der entwickelten Modelle wurde eine mehrere tausend EintrĂ€ge umfassende virtuelle Ligandbibliothek aufgebaut und mit den leistungsfĂ€higsten Modellen bewertet. Der zweite Teil der Arbeit beinhaltet ein virtuelles Screening nach neuartigen Inhibitoren der Peptiddeformylasen (PDFs) und gehört somit in den Bereich der Leitstrukturfindung. PDFs sind (meist) eisenhaltige Enzyme, die die Deformylierung von in Mitochondrien, Plastiden oder Bakterien synthetisierten Proteinen katalysieren. Ausgehend von Kristallstrukturen potenter PDF-Inhibitoren wurden 3D-Pharmakophormodelle entwickelt und validiert. Diese waren in der Lage, strukturell diverse, aus der Literatur bekannte Inhibitoren zu identifizieren (hinreichende SensitivitĂ€t) und gleichzeitig die zu durchsuchenden Datenbanken stark zu reduzieren (hinreichende SpezifitĂ€t). Die Pharmakophormodelle wurden zum Screening von Datenbanken kommerziell erhĂ€ltlicher MolekĂŒle mit wirkstoffartigen Eigenschaften benutzt. Durch Docking und Scoring wurden schließlich aus etwa zwei Millionen Verbindungen elf Substanzen identifiziert und erworben, die einer biologischen Testung unterzogen werden sollen. Erste vorliegende Messergebnisse zeigen, dass mindestens zwei der Substanzen mit einem IC50-Wert von 60 nM bzw. 190 nM potente Inhibitoren der PDF1B aus E. coli sind. Dies belegt die GĂŒte der Modelle und des angewendeten Screening-Protokolls. Inhibitoren der PDF könnten Anwendung als Herbizide, Antibiotika und Antimalaria-Therapeutika finden
    • 

    corecore