5 research outputs found

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

    Explaining inference on a population of independent agents using Bayesian networks

    Get PDF
    The main goal of this research is to design, implement, and evaluate a novel explanation method, the hierarchical explanation method (HEM), for explaining Bayesian network (BN) inference when the network is modeling a population of conditionally independent agents, each of which is modeled as a subnetwork. For example, consider disease-outbreak detection in which the agents are patients who are modeled as independent, conditioned on the factors that cause disease spread. Given evidence about these patients, such as their symptoms, suppose that the BN system infers that a respiratory anthrax outbreak is highly likely. A public-health official who received such a report would generally want to know why anthrax is being given a high posterior probability. The HEM explains such inferences. The explanation approach is applicable in general to inference on BNs that model conditionally independent agents; it complements previous approaches for explaining inference on BNs that model a single agent (e.g., for explaining the diagnostic inference for a single patient using a BN that models just that patient). The hypotheses that were tested are: (1) the proposed explanation method provides information that helps a user to understand how and why the inference results have been obtained, (2) the proposed explanation method helps to improve the quality of the inferences that users draw from evidence

    Building Bayesian Networks: Elicitation, Evaluation, and Learning

    Get PDF
    As a compact graphical framework for representation of multivariate probabilitydistributions, Bayesian networks are widely used for efficient reasoning underuncertainty in a variety of applications, from medical diagnosis to computertroubleshooting and airplane fault isolation. However, construction of Bayesiannetworks is often considered the main difficulty when applying this frameworkto real-world problems. In real world domains, Bayesian networks are often built by knowledge engineering approach. Unfortunately, eliciting knowledge from domain experts isa very time-consuming process, and could result in poor-quality graphicalmodels when not performed carefully. Over the last decade, the research focusis shifting more towards learning Bayesian networks from data, especially withincreasing volumes of data available in various applications, such asbiomedical, internet, and e-business, among others.Aiming at solving the bottle-neck problem of building Bayesian network models, thisresearch work focuses on elicitation, evaluation and learning Bayesiannetworks. Specifically, the contribution of this dissertation involves the research in the following five areas:a) graphical user interface tools forefficient elicitation and navigation of probability distributions, b) systematic and objective evaluation of elicitation schemes for probabilistic models, c)valid evaluation of performance robustness, i.e., sensitivity, of Bayesian networks,d) the sensitivity inequivalent characteristic of Markov equivalent networks, and the appropriateness of using sensitivity for model selection in learning Bayesian networks,e) selective refinement for learning probability parameters of Bayesian networks from limited data with availability of expert knowledge. In addition, an efficient algorithm for fast sensitivity analysis is developed based on relevance reasoning technique. The implemented algorithm runs very fast and makes d) and e) more affordable for real domain practice
    corecore