12 research outputs found

    Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

    Full text link
    Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated seven widely-used benchmarks for virtual screening and classification, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it may be that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy

    Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks

    Full text link
    Designing a new drug is a lengthy and expensive process. As the space of potential molecules is very large (10^23-10^60), a common technique during drug discovery is to start from a molecule which already has some of the desired properties. An interdisciplinary team of scientists generates hypothesis about the required changes to the prototype. In this work, we develop an algorithmic unsupervised-approach that automatically generates potential drug molecules given a prototype drug. We show that the molecules generated by the system are valid molecules and significantly different from the prototype drug. Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As an example, our system generated Isoniazid - one of the main drugs for Tuberculosis. The system is currently being deployed for use in collaboration with pharmaceutical companies to further analyze the additional generated molecules

    Population PK modelling and simulation based on fluoxetine and norfluoxetine concentrations in milk: a milk concentration-based prediction model

    Get PDF
    AIMS: Population pharmacokinetic (pop PK) modelling can be used for PK assessment of drugs in breast milk. However, complex mechanistic modelling of a parent and an active metabolite using both blood and milk samples is challenging. We aimed to develop a simple predictive pop PK model for milk concentration-time profiles of a parent and a metabolite, using data on fluoxetine (FX) and its active metabolite, norfluoxetine (NFX), in milk

    A Structure-Based Approach for Mapping Adverse Drug Reactions to the Perturbation of Underlying Biological Pathways

    Get PDF
    Adverse drug reactions (ADR), also known as side-effects, are complex undesired physiologic phenomena observed secondary to the administration of pharmaceuticals. Several phenomena underlie the emergence of each ADR; however, a dominant factor is the drug's ability to modulate one or more biological pathways. Understanding the biological processes behind the occurrence of ADRs would lead to the development of safer and more effective drugs. At present, no method exists to discover these ADR-pathway associations. In this paper we introduce a computational framework for identifying a subset of these associations based on the assumption that drugs capable of modulating the same pathway may induce similar ADRs. Our model exploits multiple information resources. First, we utilize a publicly available dataset pairing drugs with their observed ADRs. Second, we identify putative protein targets for each drug using the protein structure database and in-silico virtual docking. Third, we label each protein target with its known involvement in one or more biological pathways. Finally, the relationships among these information sources are mined using multiple stages of logistic-regression while controlling for over-fitting and multiple-hypothesis testing. As proof-of-concept, we examined a dataset of 506 ADRs, 730 drugs, and 830 human protein targets. Our method yielded 185 ADR-pathway associations of which 45 were selected to undergo a manual literature review. We found 32 associations to be supported by the scientific literature

    AI is a viable alternative to high throughput screening: a 318-target study

    Get PDF
    : High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery

    Wallach, Izhar

    No full text

    Improving Posing and Ranking of Molecular Docking

    No full text
    Molecular docking is a computational tool commonly applied in drug discovery projects and fundamental biological studies of protein-ligand interactions. Traditionally, molecular docking is used to address one of three following questions: (i) given a ligand molecule and a protein receptor, predict the binding mode (pose) of the ligand within the context of a receptor, (ii) screen a collection of small-molecules against a receptor and rank ligands by their likelihood of being active, and (iii) given a ligand molecule and a target receptor, predict the binding affinity of the two. Here, we focus on the first two questions, namely ranking and pose prediction. Currently, state-of-the-art docking algorithms predict poses within 2A of the native pose in a rate lower than ∼60% and in many cases, below 40%. In ranking, their ability to identify active ligands is inconsistent and generally suffers from high false-positive rate. In this thesis we present novel algorithms to enhance the ability of molecular docking to address these two questions. These algorithms do not substitute traditional docking but rather being applied on top of them to provide synergistic effect. Our algorithms improve pose predictions by 0.5-1.0A and ranking order for 23% of the targets in gold-standard benchmarks. As importantly, the algorithms improve the consistence of the posing and ranking predictions over diverse sets of targets and screening libraries. In addition to the posing and ranking, we present the pharmacophore concept. A pharmacophore is an ensemble of physiochemical descriptors associated with a biological target that elucidates common interaction patterns of ligands with that target. We introduce a novel pharmacophore inference algorithm and demonstrate its utilization in molecular docking. This thesis is outlined as follow. First we introduce the molecular docking approach for pose prediction and ranking. Second, we discuss the pharmacophore concept and present algorithms for pharmacophore inference. Third, we demonstrate the utilization of pharmacophores for pose prediction by re-scoring candidate poses generated by docking algorithms. Finally, we present algorithms to improve ranking by reducing bias in scoring functions employed by docking algorithms.Ph
    corecore