93 research outputs found

    Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

    Get PDF
    Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high levels of accuracy. However, activity cliffs – pairs of molecules that are highly similar in their structure but exhibit large differences in potency – have been underinvestigated for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization, but models that are well-equipped to accurately predict the potency of activity cliffs have an increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked more than 20 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. These results advocate for (a) the inclusion of dedicated “activity-cliff-centered” metrics during model development and evaluation, and (b) the development of novel algorithms to better predict the properties of activity cliff. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community towards addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs. This data deposit contains all trained models and the data used to train them. All models can be easily loaded and used to predict bioactivity on new molecules with MoleculeACE. Since models are target-specific, models are provided for all 30 data sets. Every model is accompanied by a configure file that describes its (optimized) hyperparameters

    Structure-based drug discovery with deep learning

    Get PDF
    Artificial intelligence (AI) in the form of deep learning bears promise for drug discovery and chemical biology, e.g.\textit{e.g.}, to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules de novo\textit{de novo}. While most of the deep learning efforts in drug discovery have focused on ligand-based approaches, structure-based drug discovery has the potential to tackle unsolved challenges, such as affinity prediction for unexplored protein targets, binding-mechanism elucidation, and the rationalization of related chemical kinetic properties. Advances in deep learning methodologies and the availability of accurate predictions for protein tertiary structure advocate for a renaissance\textit{renaissance} in structure-based approaches for drug discovery guided by AI. This review summarizes the most prominent algorithmic concepts in structure-based deep learning for drug discovery, and forecasts opportunities, applications, and challenges ahead

    Identification of novel off targets of baricitinib and tofacitinib by machine learning with a focus on thrombosis and viral infection

    Get PDF
    As there are no clear on-target mechanisms that explain the increased risk for thrombosis and viral infection or reactivation associated with JAK inhibitors, the observed elevated risk may be a result of an off-target effect. Computational approaches combined with in vitro studies can be used to predict and validate the potential for an approved drug to interact with additional (often unwanted) targets and identify potential safety-related concerns. Potential off-targets of the JAK inhibitors baricitinib and tofacitinib were identified using two established machine learning approaches based on ligand similarity. The identified targets related to thrombosis or viral infection/reactivation were subsequently validated using in vitro assays. Inhibitory activity was identified for four drug-target pairs (PDE10A [baricitinib], TRPM6 [tofacitinib], PKN2 [baricitinib, tofacitinib]). Previously unknown off-target interactions of the two JAK inhibitors were identified. As the proposed pharmacological effects of these interactions include attenuation of pulmonary vascular remodeling, modulation of HCV response, and hypomagnesemia, the newly identified off-target interactions cannot explain an increased risk of thrombosis or viral infection/reactivation. While further evidence is required to explain both the elevated thrombosis and viral infection/reactivation risk, our results add to the evidence that these JAK inhibitors are promiscuous binders and highlight the potential for repurposing.</p

    Deep learning for low-data drug discovery:Hurdles and opportunities

    Get PDF
    Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches–which are notoriously ‘data-hungry’–might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.</p

    Effectiveness of molecular fingerprints for exploring the chemical space of natural products

    Get PDF
    Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints.</p

    A QSTR-based expert system to predict sweetness of molecules

    Get PDF
    This work describes a novel approach based on advanced molecular similarity to predict the sweetness of chemicals. The proposed Quantitative Structure-Taste Relationship (QSTR) model is an expert system developed keeping in mind the five principles defined by the Organization for Economic Co-operation and Development (OECD) for the validation of (Q)SARs. The 649 sweet and non-sweet molecules were described by both conformation-independent extended-connectivity fingerprints (ECFPs) and molecular descriptors. In particular, the molecular similarity in the ECFPs space showed a clear association with molecular taste and it was exploited for model development. Molecules laying in the subspaces where the taste assignation was more difficult were modeled trough a consensus between linear and local approaches (Partial Least Squares-Discriminant Analysis and N-nearest-neighbor classifier). The expert system, which was thoroughly validated through a Monte Carlo procedure and an external set, gave satisfactory results in comparison with the state-of-the-art models. Moreover, the QSTR model can be leveraged into a greater understanding of the relationship between molecular structure and sweetness, and into the design of novel sweeteners.Instituto de Investigaciones FisicoquĂ­micas TeĂłricas y AplicadasFacultad de Ciencias Exacta
    • 

    corecore