18 research outputs found

    Enrichment of metabolic routes through Big Data

    Get PDF
    The Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway is a database that contains a graphical representation of cellular processes. Cellular processes are basic systems involving biochemical reactions at the cellular level such as transport, catabolism, metabolism, growth and cell death. The KEGG Pathway information is shown through the use of graphs, in which the molecular interactions between genes, processes and chemical compounds are represented. This paper proposes to perform Data Analytics using the Big Data Analytics Life Cycle methodology to enrich the metabolic pathways of the KEGG Pathway database by applying the Target Fishing technique

    Verifying the fully “Laplacianised” posterior Naïve Bayesian approach and more

    Get PDF
    Mussa and Glen would like to thank Unilever for financial support, whereas Mussa and Mitchell thank the BBSRC for funding this research through grant BB/I00596X/1. Mitchell thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support.Background In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the “Laplacian Corrected Modified Naïve Bayes” (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG’s work and introduces a new version of the SNB classifier: “Tapered Naïve Bayes” (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. Results LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the “optimal” number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the “optimal” number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the “optimal” number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. Conclusions The classification results obtained in this study concur with the mathematical based guidelines given in MMG’s paper—that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.Publisher PDFPeer reviewe

    Target fishing e modelagem molecular de nitroheteroarilchalconas com potente atividade antituberculose

    Get PDF
    Computer-aided drug planning strategies have contributed to the research and development of new anti-tuberculosis (TB) drugs to avoid resistance and reduce treatment time and the number of drugs used in therapy. The aim of work was to carry out a docking study to identify the possible mechanism of action of the hits LabMol73, 84, 86 and 93 previously tested in the Microplate Assay Blue Alamar (MABA), Low Oxygen Recovery Assay (LORA) assays in sensitive strains from M.tb. H37Rv and resistant to the standard drugs rifampicin and isoniazid, due to the promising inhibitory result against these strains, suggesting a different mechanism of action from existing drugs. The reverse virtual screening was performed on the Pharmmapper platform, which identifies targets by pharmacophoric model. The most promising targets have been validated. The DM was performed in the OpenEye Maestro program for analysis of poses and energy score. Sixteen targets M.tb. H37Rv were identified and only nine demonstrated viability for computational testing. The most promising results were observed in the mycolic acid cyclopropane synthase (PDB:1L1E) and pantothenate synthetase (PDB:1N2B) targets. Since, in the target 1L1E score results obtained were between -6.998 to -7.767 kcal/mol. In the 1N2B target the results were between -6.421 to -7.293 kcal/mol, presenting themselves as the most promising targets due to their similar score scores between the two targets suggesting that the mechanism of action may be the inhibition of one of these targets. These targets proved to be promising for elucidating the mechanism of action of the analyzed nitro heteroaryl chalcones, as they corroborate the assay against resistant strains, demonstrating that standard drugs have activity against other targets and also because mycolic acid and pantothenate are directly linked to virulence and resistance of M.tb. H37Rv.Computer-aided drug planning strategies have contributed to the research and development of new anti-tuberculosis (TB) drugs to avoid resistance and reduce treatment time and the number of drugs used in therapy. The aim of work was to carry out a docking study to identify the possible mechanism of action of the hits LabMol73, 84, 86 and 93 previously tested in the Microplate Assay Blue Alamar (MABA), Low Oxygen Recovery Assay (LORA) assays in sensitive strains from M.tb. H37Rv and resistant to the standard drugs rifampicin and isoniazid, due to the promising inhibitory result against these strains, suggesting a different mechanism of action from existing drugs. The reverse virtual screening was performed on the Pharmmapper platform, which identifies targets by pharmacophoric model. The most promising targets have been validated. The DM was performed in the OpenEye Maestro program for analysis of poses and energy score. Sixteen targets M.tb. H37Rv were identified and only nine demonstrated viability for computational testing. The most promising results were observed in the mycolic acid cyclopropane synthase (PDB:1L1E) and pantothenate synthetase (PDB:1N2B) targets. Since, in the target 1L1E score results obtained were between -6.998 to -7.767 kcal/mol. In the 1N2B target the results were between -6.421 to -7.293 kcal/mol, presenting themselves as the most promising targets due to their similar score scores between the two targets suggesting that the mechanism of action may be the inhibition of one of these targets. These targets proved to be promising for elucidating the mechanism of action of the analyzed nitro heteroaryl chalcones, as they corroborate the assay against resistant strains, demonstrating that standard drugs have activity against other targets and also because mycolic acid and pantothenate are directly linked to virulence and resistance of M.tb. H37Rv

    Accurate and efficient target prediction using a potency-sensitive influence-relevance voter

    Get PDF
    BackgroundA number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows.ResultsUsing a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database.ConclusionsWe present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/

    Target prediction utilising negative bioactivity data covering large chemical space.

    Get PDF
    BACKGROUND: In silico analyses are increasingly being used to support mode-of-action investigations; however many such approaches do not utilise the large amounts of inactive data held in chemogenomic repositories. The objective of this work is concerned with the integration of such bioactivity data in the target prediction of orphan compounds to produce the probability of activity and inactivity for a range of targets. To this end, a novel human bioactivity data set was constructed through the assimilation of over 195 million bioactivity data points deposited in the ChEMBL and PubChem repositories, and the subsequent application of a sphere-exclusion selection algorithm to oversample presumed inactive compounds. RESULTS: A Bernoulli Naïve Bayes algorithm was trained using the data and evaluated using fivefold cross-validation, achieving a mean recall and precision of 67.7 and 63.8 % for active compounds and 99.6 and 99.7 % for inactive compounds, respectively. We show the performances of the models are considerably influenced by the underlying intraclass training similarity, the size of a given class of compounds, and the degree of additional oversampling. The method was also validated using compounds extracted from WOMBAT producing average precision-recall AUC and BEDROC scores of 0.56 and 0.85, respectively. Inactive data points used for this test are based on presumed inactivity, producing an approximated indication of the true extrapolative ability of the models. A distance-based applicability domain analysis was also conducted; indicating an average Tanimoto Coefficient distance of 0.3 or greater between a test and training set can be used to give a global measure of confidence in model predictions. A final comparison to a method trained solely on active data from ChEMBL performed with precision-recall AUC and BEDROC scores of 0.45 and 0.76. CONCLUSIONS: The inclusion of inactive data for model training produces models with superior AUC and improved early recognition capabilities, although the results from internal and external validation of the models show differing performance between the breadth of models. The realised target prediction protocol is available at https://github.com/lhm30/PIDGIN.Graphical abstractThe inclusion of large scale negative training data for in silico target prediction improves the precision and recall AUC and BEDROC scores for target models.The authors thank Krishna C. Bulusu for proof reading the manuscript. LHM would like to thank BBSRC and AstraZeneca and for their funding. GD thanks EPSRC and Eli Lilly for funding.This is the final version of the article. It first appeared from Springer via http://dx.doi.org/10.1186/s13321-015-0098-
    corecore