15,593 research outputs found

    General properties of Bayesian learning as statistical inference determined by conditional expectations

    Get PDF
    We investigate the general properties of general Bayesian learning, where “general Bayesian learning” means inferring a state from another that is regarded as evidence, and where the inference is conditionalizing the evidence using the conditional expectation determined by a reference probability measure representing the background subjective degrees of belief of a Bayesian Agent performing the inference. States are linear functionals that encode probability measures by assigning expectation values to random variables via integrating them with respect to the probability measure. If a state can be learned from another this way, then it is said to be Bayes accessible from the evidence. It is shown that the Bayes accessibility relation is reflexive, antisymmetric and non-transitive. If every state is Bayes accessible from some other defined on the same set of random variables, then the set of states is called weakly Bayes connected. It is shown that the set of states is not weakly Bayes connected if the probability space is standard. The set of states is called weakly Bayes connectable if, given any state, the probability space can be extended in such a way that the given state becomes Bayes accessible from some other state in the extended space. It is shown that probability spaces are weakly Bayes connectable. Since conditioning using the theory of conditional expectations includes both Bayes’ rule and Jeffrey conditionalization as special cases, the results presented generalize substantially some results obtained earlier for Jeffrey conditionalization

    Credal Networks under Epistemic Irrelevance

    Get PDF
    A credal network under epistemic irrelevance is a generalised type of Bayesian network that relaxes its two main building blocks. On the one hand, the local probabilities are allowed to be partially specified. On the other hand, the assessments of independence do not have to hold exactly. Conceptually, these two features turn credal networks under epistemic irrelevance into a powerful alternative to Bayesian networks, offering a more flexible approach to graph-based multivariate uncertainty modelling. However, in practice, they have long been perceived as very hard to work with, both theoretically and computationally. The aim of this paper is to demonstrate that this perception is no longer justified. We provide a general introduction to credal networks under epistemic irrelevance, give an overview of the state of the art, and present several new theoretical results. Most importantly, we explain how these results can be combined to allow for the design of recursive inference methods. We provide numerous concrete examples of how this can be achieved, and use these to demonstrate that computing with credal networks under epistemic irrelevance is most definitely feasible, and in some cases even highly efficient. We also discuss several philosophical aspects, including the lack of symmetry, how to deal with probability zero, the interpretation of lower expectations, the axiomatic status of graphoid properties, and the difference between updating and conditioning

    Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle

    Full text link
    A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.Comment: 20 pages, 4 figures; extended version submitted to Behaviormetrik
    • 

    corecore