104 research outputs found

    Computational methods and tools to predict cytochrome P450 metabolism for drug discovery

    Get PDF
    In this review, we present important, recent developments in the computational prediction of cytochrome P450 (CYP) metabolism in the context of drug discovery. We discuss in silico models for the various aspects of CYP metabolism prediction, including CYP substrate and inhibitor predictors, site of metabolism predictors (i.e., metabolically labile sites within potential substrates) and metabolite structure predictors. We summarize the different approaches taken by these models, such as rule‐based methods, machine learning, data mining, quantum chemical methods, molecular interaction fields, and docking. We highlight the scope and limitations of each method and discuss future implications for the field of metabolism prediction in drug discovery.publishedVersio

    Similarity-Based Methods and Machine Learning Approaches for Target Prediction in Early Drug Discovery: Performance and Scope

    Get PDF
    Computational methods for predicting the macromolecular targets of drugs and drug-like compounds have evolved as a key technology in drug discovery. However, the established validation protocols leave several key questions regarding the performance and scope of methods unaddressed. For example, prediction success rates are commonly reported as averages over all compounds of a test set and do not consider the structural relationship between the individual test compounds and the training instances. In order to obtain a better understanding of the value of ligand-based methods for target prediction, we benchmarked a similarity-based method and a random forest based machine learning approach (both employing 2D molecular fingerprints) under three testing scenarios: a standard testing scenario with external data, a standard time-split scenario, and a scenario that is designed to most closely resemble real-world conditions. In addition, we deconvoluted the results based on the distances of the individual test molecules from the training data. We found that, surprisingly, the similarity-based approach generally outperformed the machine learning approach in all testing scenarios, even in cases where queries were structurally clearly distinct from the instances in the training (or reference) data, and despite a much higher coverage of the known target space.publishedVersio

    Scope of 3D shape-based approaches in predicting the macromolecular targets of structurally complex small molecules including natural products and macrocyclic ligands

    Get PDF
    A plethora of similarity-based, network-based, machine learning, docking and hybrid approaches for predicting the macromolecular targets of small molecules are available today and recognized as valuable tools for providing guidance in early drug discovery. With the increasing maturity of target prediction methods, researchers have started to explore ways to expand their scope to more challenging molecules such as structurally complex natural products and macrocyclic small molecules. In this work, we systematically explore the capacity of an alignment-based approach to identify the targets of structurally complex small molecules (including large and flexible natural products and macrocyclic compounds) based on the similarity of their 3D molecular shape to noncomplex molecules (i.e., more conventional, “drug-like”, synthetic compounds). For this analysis, query sets of 10 representative, structurally complex molecules were compiled for each of the 28 pharmaceutically relevant proteins. Subsequently, ROCS, a leading shape-based screening engine, was utilized to generate rank-ordered lists of the potential targets of the 28 × 10 queries according to the similarity of their 3D molecular shapes with those of compounds from a knowledge base of 272 640 noncomplex small molecules active on a total of 3642 different proteins. Four of the scores implemented in ROCS were explored for target ranking, with the TanimotoCombo score consistently outperforming all others. The score successfully recovered the targets of 30% and 41% of the 280 queries among the top-5 and top-20 positions, respectively. For 24 out of the 28 investigated targets (86%), the method correctly assigned the first rank (out of 3642) to the target of interest for at least one of the 10 queries. The shape-based target prediction approach showed remarkable robustness, with good success rates obtained even for compounds that are clearly distinct from any of the ligands present in the knowledge base. However, complex natural products and macrocyclic compounds proved to be challenging even with this approach, although cases of complete failure were recorded only for a small number of targets.publishedVersio

    BonMOLière: Small-Sized Libraries of Readily Purchasable Compounds, Optimized to Produce Genuine Hits in Biological Screens across the Protein Space

    Get PDF
    Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the “fitness” of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle (“BonMOLière”).publishedVersio

    Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters

    Get PDF
    Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially “badly behaving compounds”, “bad actors”, or “nuisance compounds”. These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on “privileged scaffolds” to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose–response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de, not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.acceptedVersio

    Natural products against acute respiratory infections: Strategies and lessons learned

    Get PDF
    Under embargo until: 11.10.2020Ethnopharmacological relevance: A wide variety of traditional herbal remedies have been used throughout history for the treatment of symptoms related to acute respiratory infections (ARIs). Aim of the review: The present work provides a timely overview of natural products affecting the most common pathogens involved in ARIs, in particular influenza viruses and rhinoviruses as well as bacteria involved in co-infections, their molecular targets, their role in drug discovery, and the current portfolio of available naturally derived anti-ARI drugs. Materials and methods: Literature of the last ten years was evaluated for natural products active against influenza viruses and rhinoviruses. The collected bioactive agents were further investigated for reported activities against ARI-relevant bacteria, and analysed for the chemical space they cover in relation to currently known natural products and approved drugs. Results: An overview of (i) natural compounds active in target-based and/or phenotypic assays relevant to ARIs, (ii) extracts, and (iii) in vivo data are provided, offering not only a starting point for further in-depth phytochemical and antimicrobial studies, but also revealing insights into the most relevant anti-ARI scaffolds and compound classes. Investigations of the chemical space of bioactive natural products based on principal component analysis show that many of these compounds are drug-like. However, some bioactive natural products are substantially larger and have more polar groups than most approved drugs. A workflow with various strategies for the discovery of novel antiviral agents is suggested, thereby evaluating the merit of in silico techniques, the use of complementary assays, and the relevance of ethnopharmacological knowledge on the exploration of the therapeutic potential of natural products. Conclusions: The longstanding ethnopharmacological tradition of natural remedies against ARIs highlights their therapeutic impact and remains a highly valuable selection criterion for natural materials to be investigated in the search for novel anti-ARI acting concepts. We observe a tendency towards assaying for broad-spectrum antivirals and antibacterials mainly discovered in interdisciplinary academic settings, and ascertain a clear demand for more translational studies to strengthen efforts for the development of effective and safe therapeutic agents for patients suffering from ARIs.acceptedVersio

    Skin Doctor: Machine learning models for skin sensitization prediction that provide estimates and indicators of prediction reliability

    Get PDF
    The ability to predict the skin sensitization potential of small organic molecules is of high importance to the development and safe application of cosmetics, drugs and pesticides. One of the most widely accepted methods for predicting this hazard is the local lymph node assay (LLNA). The goal of this work was to develop in silico models for the prediction of the skin sensitization potential of small molecules that go beyond the state of the art, with larger LLNA data sets and, most importantly, a robust and intuitive definition of the applicability domain, paired with additional indicators of the reliability of predictions. We explored a large variety of molecular descriptors and fingerprints in combination with random forest and support vector machine classifiers. The most suitable models were tested on holdout data, on which they yielded competitive performance (Matthews correlation coefficients up to 0.52; accuracies up to 0.76; areas under the receiver operating characteristic curves up to 0.83). The most favorable models are available via a public web service that, in addition to predictions, provides assessments of the applicability domain and indicators of the reliability of the individual predictions. View Full-Text Keywords: skin sensitization potential; prediction; in silico models; machine learning; local lymph node assay (LLNA); cosmetics; drugs; pesticides; chemical space; applicability domainpublishedVersio
    • …
    corecore