28 research outputs found

    A novel hybrid ultrafast shape descriptor method for use in virtual screening.

    Get PDF
    BACKGROUND: We have introduced a new Hybrid descriptor composed of the MACCS key descriptor encoding topological information and Ballester and Richards' Ultrafast Shape Recognition (USR) descriptor. The latter one is calculated from the moments of the distribution of the interatomic distances, and in this work we also included higher moments than in the original implementation. RESULTS: The performance of this Hybrid descriptor is assessed using Random Forest and a dataset of 116,476 molecules. Our dataset includes 5,245 molecules in ten classes from the 2005 World Anti-Doping Agency (WADA) dataset and 111,231 molecules from the National Cancer Institute (NCI) database. In a 10-fold Monte Carlo cross-validation this dataset was partitioned into three distinct parts for training, optimisation of an internal threshold that we introduced, and validation of the resulting model. The standard errors obtained were used to assess statistical significance of observed improvements in performance of our new descriptor. CONCLUSION: The Hybrid descriptor was compared to the MACCS key descriptor, USR with the first three (USR), four (UF4) and five (UF5) moments, and a combination of MACCS with USR (three moments). The MACCS key descriptor was not combined with UF5, due to similar performance of UF5 and UF4. Superior performance in terms of all figures of merit was found for the MACCS/UF4 Hybrid descriptor with respect to all other descriptors examined. These figures of merit include recall in the top 1% and top 5% of the ranked validation sets, precision, F-measure, area under the Receiver Operating Characteristic curve and Matthews Correlation Coefficient

    Exploring Off-Targets and Off-Systems for Adverse Drug Reactions via Chemical-Protein Interactome — Clozapine-Induced Agranulocytosis as a Case Study

    Get PDF
    In the era of personalized medical practice, understanding the genetic basis of patient-specific adverse drug reaction (ADR) is a major challenge. Clozapine provides effective treatments for schizophrenia but its usage is limited because of life-threatening agranulocytosis. A recent high impact study showed the necessity of moving clozapine to a first line drug, thus identifying the biomarkers for drug-induced agranulocytosis has become important. Here we report a methodology termed as antithesis chemical-protein interactome (CPI), which utilizes the docking method to mimic the differences in the drug-protein interactions across a panel of human proteins. Using this method, we identified HSPA1A, a known susceptibility gene for CIA, to be the off-target of clozapine. Furthermore, the mRNA expression of HSPA1A-related genes (off-target associated systems) was also found to be differentially expressed in clozapine treated leukemia cell line. Apart from identifying the CIA causal genes we identified several novel candidate genes which could be responsible for agranulocytosis. Proteins related to reactive oxygen clearance system, such as oxidoreductases and glutathione metabolite enzymes, were significantly enriched in the antithesis CPI. This methodology conducted a multi-dimensional analysis of drugs' perturbation to the biological system, investigating both the off-targets and the associated off-systems to explore the molecular basis of an adverse event or the new uses for old drugs

    Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P

    No full text
    This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (T-m), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE 0.73, r(2) = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r(2) = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r(2) = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.</p
    corecore