6 research outputs found

    Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction

    Get PDF
    Motivation: The University of Minnesota Pathway Prediction System (UM-PPS) is a rule-based expert system to predict plausible biodegradation pathways for organic compounds. However, iterative application of these rules to generate biodegradation pathways leads to combinatorial explosion. We use data from known biotransformation pathways to rationally determine biotransformation priorities (relative reasoning rules) to limit this explosion. Results: A total of 112 relative reasoning rules were identified and implemented. In one prediction step, i.e. as per one generation predicted, the use of relative reasoning decreases the predicted biotransformations by over 25% for 50 compounds used to generate the rules and by about 15% for an external validation set of 47 xenobiotics, including pesticides, biocides and pharmaceuticals. The percentage of correctly predicted, experimentally known products remains at 75% when relative reasoning is used. The set of relative reasoning rules identified, therefore, effectively reduces the number of predicted transformation products without compromising the quality of the predictions. Availability: The UM-PPS server is freely available on the web to all users at the time of submission of this manuscript and will be available following publication at http://umbbd.msi.umn.edu/predict/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Predicting biodegradation products and pathways: a hybrid knowledge- and machine learning-based approach

    Get PDF
    Motivation: Current methods for the prediction of biodegradation products and pathways of organic environmental pollutants either do not take into account domain knowledge or do not provide probability estimates. In this article, we propose a hybrid knowledge- and machine learning-based approach to overcome these limitations in the context of the University of Minnesota Pathway Prediction System (UM-PPS). The proposed solution performs relative reasoning in a machine learning framework, and obtains one probability estimate for each biotransformation rule of the system. As the application of a rule then depends on a threshold for the probability estimate, the trade-off between recall (sensitivity) and precision (selectivity) can be addressed and leveraged in practice. Results: Results from leave-one-out cross-validation show that a recall and precision of ∼0.8 can be achieved for a subset of 13 transformation rules. Therefore, it is possible to optimize precision without compromising recall. We are currently integrating the results into an experimental version of the UM-PPS server. Availability: The program is freely available on the web at http://wwwkramer.in.tum.de/research/applications/biodegradation/data. Contact: [email protected]

    Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction

    No full text
    Motivation: The University of Minnesota Pathway Prediction System (UM-PPS) is a rule-based expert system to predict plausible biodegradation pathways for organic compounds. However, iterative application of these rules to generate biodegradation pathways leads to combinatorial explosion. We use data from known biotransformation pathways to rationally determine biotransformation priorities (relative reasoning rules) to limit this explosion. Results: A total of 112 relative reasoning rules were identified and implemented. In one prediction step, i.e. as per one generation predicted, the use of relative reasoning decreases the predicted biotransformations by over 25% for 50 compounds used to generate the rules and by about 15% for an external validation set of 47 xenobiotics, including pesticides, biocides and pharmaceuticals. The percentage of correctly predicted, experimentally known products remains at 75% when relative reasoning is used. The set of relative reasoning rules identified, therefore, effectively reduces the number of predicted transformation products without compromising the quality of the predictions. Availability: The UM-PPS server is freely available on the web to all users at the time of submission of this manuscript and will be available following publication at http://umbbd.msi.umn.edu/predict/

    Predicting biodegradation products and pathways : a hybrid knowledge- and machine learning-based approach

    No full text
    Current methods for the prediction of biodegradation products and pathways of organic environmental pollutants either do not take into account domain knowledge or do not provide probability estimates. In this presentation, we show a hybrid knowledge-based and machine learning based approach to overcome these limitations in the context of the University of Minnesota Pathway Prediction System (UM-PPS). The proposed solution performs relative reasoning in a machine learning framework, and obtains one probability estimate for each biotransformation rule of the system. Therefore, one model is learned for each transformation rule of the UM-PPS. The training set consists of a set of structures and their triggered transformation rules of the UM-PPS. For these structures, the correctness of the transformation rules are known. Thus, one part of the input of the classifier is structural information, represented by frequent acyclic substructures of the structure. Additionally, the classifier takes into account which other transformation rules are triggered by the UM-PPS system. For classification we used Support Vector Machines and Random Forests. To apply the learned models, the predicted transformation rules of the UM-PPS are required. Given the structure, the models return a probability for each triggered transformation rule to be correctly triggered. As the application of a rule then depends on a threshold for the probability estimate, the trade-off between recall (sensitivity) and precision (selectivity) can be addressed and leveraged in practice. We evaluated the classifiers by micro averaging over all classifier outputs. Results from leave-one-out cross-validation show that a recall and precision of approximately 0.8 can be achieved for a subset of 13 transformation rules. Therefore, it is possible to optimize precision without compromising recall. The 13 rules are selected using the distribution and amount of triggered rules and the correctness of this rule. We set a threshold for the amount and ratio of positive examples to check if a rule is taken into account or not. We are currently integrating the results into an experimental version of the UM-PPS server
    corecore