31 research outputs found

    The influence of negative training set size on machine learning-based virtual screening

    Get PDF
    BACKGROUND: The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. RESULTS: The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. CONCLUSIONS: In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening

    A novel hybrid ultrafast shape descriptor method for use in virtual screening.

    Get PDF
    BACKGROUND: We have introduced a new Hybrid descriptor composed of the MACCS key descriptor encoding topological information and Ballester and Richards' Ultrafast Shape Recognition (USR) descriptor. The latter one is calculated from the moments of the distribution of the interatomic distances, and in this work we also included higher moments than in the original implementation. RESULTS: The performance of this Hybrid descriptor is assessed using Random Forest and a dataset of 116,476 molecules. Our dataset includes 5,245 molecules in ten classes from the 2005 World Anti-Doping Agency (WADA) dataset and 111,231 molecules from the National Cancer Institute (NCI) database. In a 10-fold Monte Carlo cross-validation this dataset was partitioned into three distinct parts for training, optimisation of an internal threshold that we introduced, and validation of the resulting model. The standard errors obtained were used to assess statistical significance of observed improvements in performance of our new descriptor. CONCLUSION: The Hybrid descriptor was compared to the MACCS key descriptor, USR with the first three (USR), four (UF4) and five (UF5) moments, and a combination of MACCS with USR (three moments). The MACCS key descriptor was not combined with UF5, due to similar performance of UF5 and UF4. Superior performance in terms of all figures of merit was found for the MACCS/UF4 Hybrid descriptor with respect to all other descriptors examined. These figures of merit include recall in the top 1% and top 5% of the ranked validation sets, precision, F-measure, area under the Receiver Operating Characteristic curve and Matthews Correlation Coefficient

    BINARY QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIP ANALYSIS IN RETROSPECTIVE STRUCTURE-BASED VIRTUAL SCREENING CAMPAIGNS TARGETING ESTROGEN RECEPTOR ALPHA

    Get PDF
      Objective: The objective of this study is to construct predictive unbiased structure-based virtual screening (SBVS) protocols to identify potent ligands for estrogen receptor alpha by combining molecular docking, protein-ligand interaction fingerprinting (PLIF), and binary quantitative structure-activity relationship (QSAR) analysis using recursive partition and regression tree method.Methods: Employing the enhanced version of a directory of useful decoys, SBVS protocols using molecular docking simulations, and PLIF were constructed and retrospectively validated. To avoid bias, SMILES format of the compounds was used. The predictive abilities of the SBVS protocols were then compared based on the enrichment factor (EF) and the F-measure values.Results: The SBVS protocols resulted in this research were SBVS_1 (employing docking scores of the best pose on every compound to rank the results and selecting compounds within 1% false positives as positive), SBVS_2 (employing decision tree resulted from the binary QSAR analysis using docking scores and PLIF bitstrings of the best pose of every compound as descriptors), and SBVS_3 (employing decision tree resulted from the binary QSAR analysis using ensemble PLIF of the selected poses from optimized docking score as the cutoff). The EF values of SBVS_1, SBVS_2, and SBVS_3 are 28.315, 576.084, and 713.472, respectively, while their F-measure values are 0.310, 0.573, and 0.769, respectively.Conclusion: Highly predictive unbiased SBVS protocols to identify potent estrogen receptor alpha ligands were constructed. Further application in prospective screening is therefore highly suggested

    OPTIMIZING STRUCTURE-BASED VIRTUAL SCREENING PROTOCOL TO IDENTIFY PHYTOCHEMICALS AS CYCLOOXYGENASE-2 INHIBITORS

    Get PDF
    By employing Databases of Useful Decoys (DUD) and its enhanced version (DUD-E), several attempts to construct validated Structure-based Virtual Screening (SBVS) protocols to identify cyclooxygenase-2 (COX-2) inhibitors have been performed. Both databases tagged active COX-2 inhibitors for compounds with IC50 values < 1mM. In the search for phytochemicals as natural COX-2 inhibitors, however, most of their IC50 values are in the micromolar range, which will likely be identified as non-inhibitors for COX-2 by the available SBVS protocols. In this article, validation of an SBVS protocol by adding marginal active COX-2 inhibitors from DUD-E as active compounds is presented. Binary quantitative-structure activity relationship analysis by using recursive partition and regression tree method was performed subsequently to optimize the predictive ability of the protocol. The enrichment factor and the F-measure values of the optimized protocol could reach 44.78 and 0.47, respectively. The optimized protocol could identify 1 out of 9 phytochemicals as COX-2 inhibitors

    Computer-aided Design of Chalcone Derivatives as Lead Compounds Targeting Acetylcholinesterase

    Get PDF
    One of well-established biological activities for chalcone derivatives is as acetylcholinesterase inhibitors, which can be developed for the therapy of Alzheimer’s disease. Assisted byretrospectively validated structure-based virtual screening (SBVS) protocol to identify potent acetylcholinesterase inhibitors, 80chalcone derivatives were designed and virtually screened. The F-measure value as the parameter of the predictive ability of the SBVS protocol developed in the research presented in this article was 0.413, which was considerably better than the original SBVS protocol (F-measure = 0.226). Among the screened chalcone derivatives two were selected as potential lead compounds to designpotent inhibitors for acetylcholinesterase: 3-[4-(benzyloxy)-3-methoxyphenyl]-1-(4-hydroxy-3-methoxyphenyl)prop-2-en-1-one(3k) and 3-[4-(benzyloxy)-3-methoxyphenyl]-1-(4-hydroxyphenyl)prop-2-en-1-one (4k)
    corecore