5 research outputs found

    In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches

    Get PDF
    With the advent of High Throughput Screening techniques, it is feasible to filter possible leads from a mammoth chemical space that can act against a particular target and inhibit its action. Virtual screening complements the in-vitro assays which are costly and time consuming. This process is used to sort biologically active molecules by utilizing the structural and chemical information of the compounds and the target proteins in order to screen potential hits. Various data mining and machine learning tools utilize Molecular Descriptors through the knowledge discovery process using classifier algorithms that classify the potentially active hits for the drug development process.
&#xa

    Predictive models for anti-tubercular molecules using machine learning on high-throughput biological screening datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tuberculosis is a contagious disease caused by <it>Mycobacterium tuberculosis </it>(Mtb), affecting more than two billion people around the globe and is one of the major causes of morbidity and mortality in the developing world. Recent reports suggest that Mtb has been developing resistance to the widely used anti-tubercular drugs resulting in the emergence and spread of multi drug-resistant (MDR) and extensively drug-resistant (XDR) strains throughout the world. In view of this global epidemic, there is an urgent need to facilitate fast and efficient lead identification methodologies. Target based screening of large compound libraries has been widely used as a fast and efficient approach for lead identification, but is restricted by the knowledge about the target structure. Whole organism screens on the other hand are target-agnostic and have been now widely employed as an alternative for lead identification but they are limited by the time and cost involved in running the screens for large compound libraries. This could be possibly be circumvented by using computational approaches to prioritize molecules for screening programmes.</p> <p>Results</p> <p>We utilized physicochemical properties of compounds to train four supervised classifiers (Naïve Bayes, Random Forest, J48 and SMO) on three publicly available bioassay screens of Mtb inhibitors and validated the robustness of the predictive models using various statistical measures.</p> <p>Conclusions</p> <p>This study is a comprehensive analysis of high-throughput bioassay data for anti-tubercular activity and the application of machine learning approaches to create target-agnostic predictive models for anti-tubercular agents.</p

    Performance of knowledge-based biological models in higher dimensional chemical space

    No full text
    This study evaluates the improvement of the knowledge-based biological models by incorporating additional advanced molecular descriptors to the existing classical descriptors. It was found that the inclusion of constitutional, topological, and hybrid descriptors in the generation of biological models trained on Mtb (Mycobacterium tuberculosis) bioassay dataset using classifiers like Random Forest, J48, Naive Bayes, and SMO (Sequential Minimal Optimization) have found to enhance the performance of these models. (C) 2016 Elsevier B.V. All rights reserved

    In-silico predictive mutagenicity model generation using supervised learning approaches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Experimental screening of chemical compounds for biological activity is a time consuming and expensive practice. <it>In silico</it> predictive models permit inexpensive, rapid “virtual screening” to prioritize selection of compounds for experimental testing. Both experimental and <it>in silico</it> screening can be used to test compounds for desirable or undesirable properties. Prior work on prediction of mutagenicity has primarily involved identification of toxicophores rather than whole-molecule predictive models. In this work, we examined a range of <it>in silico</it> predictive classification models for prediction of mutagenic properties of compounds, including methods such as J48 and SMO which have not previously been widely applied in cheminformatics.</p> <p>Results</p> <p>The Bursi mutagenicity data set containing 4337 compounds (Set 1) and a Benchmark data set of 6512 compounds (Set 2) were taken as input data set in this work. A third data set (Set 3) was prepared by joining up the previous two sets. Classification algorithms including Naïve Bayes, Random Forest, J48 and SMO with 10 fold cross-validation and default parameters were used for model generation on these data sets. Models built using the combined performed better than those developed from the Benchmark data set. Significantly, Random Forest outperformed other classifiers for all the data sets, especially for Set 3 with 89.27% accuracy, 89% precision and ROC of 95.3%. To validate the developed models two external data sets, AID1189 and AID1194, with mutagenicity data were tested showing 62% accuracy with 67% precision and 65% ROC area and 91% accuracy, 91% precision with 96.3% ROC area respectively. A Random Forest model was used on approved drugs from DrugBank and metabolites from the Zinc Database with True Positives rate almost 85% showing the robustness of the model.</p> <p>Conclusion</p> <p>We have created a new mutagenicity benchmark data set with around 8,000 compounds. Our work shows that highly accurate predictive mutagenicity models can be built using machine learning methods based on chemical descriptors and trained using this set, and these models provide a complement to toxicophores based methods. Further, our work supports other recent literature in showing that Random Forest models generally outperform other comparable machine learning methods for this kind of application.</p
    corecore