Search CORE

9,090 research outputs found

The accuracy of a Bayesian Network

Author: Pappas A
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/2002
Field of study

A Bayesian network is a construct that represents a joint probability distribution, and can be used in order to model a given joint probability distribution. A principal characteristic of a Bayesian network is the degree to which it models the given joint probability distribution accurately; the accuracy of a Bayesian network. Although the accuracy of a Bayesian network can be well defined in theory, it is rarely possible to determine the accuracy of a Bayesian network in practice for real-world applications. Instead, alternative characteristics of a Bayesian network, which relate to and reflect the accuracy, are used to model the accuracy of a Bayesian network, and appropriate measures are devised. A popular formalism that adopts such methods to study the accuracy of a Bayesian network is the Minimum Description Length (MDL) formalism, which models the accuracy of a Bayesian network as the probability of the Bayesian network given the data set that describes the joint probability distribution the Bayesian network models. However, in the context of Bayesian Networks, the MDL formalism is flawed, exhibiting several shortcomings, and thus inappropriate for examining the accuracy of a Bayesian network. An alternative framework for Bayesian Networks is proposed, which models the accuracy of a Bayesian network as the accuracy of the conditional independencies implied by the structure of the Bayesian network, and specifies an appropriate measure called the Network Conditional Independencies Mutual Information (NCIMI) measure. The proposed framework is inspired by the principles governing the field of Bayesian Networks, and is based on formal theoretical foundations. Experiments have been conducted, using real-world problems, that evaluate both the MDL formalism and the proposed framework for Bayesian Networks. The experimental results support the theoretical claims, and confirm the significance of the proposed framework

Spiral - Imperial College Digital Repository

OpenGrey Repository

Context-tree weighting and Bayesian Context Trees: Asymptotic and non-asymptotic justifications

Author: Kontoyiannis Ioannis
Publication venue
Publication date: 04/11/2022
Field of study

The Bayesian Context Trees (BCT) framework is a recently introduced, general collection of statistical and algorithmic tools for modelling, analysis and inference with discrete-valued time series. The foundation of this development is built in part on some well-known information-theoretic ideas and techniques, including Rissanen's tree sources and Willems et al.'s context-tree weighting algorithm. This paper presents a collection of theoretical results that provide mathematical justifications and further insight into the BCT modelling framework and the associated practical tools. It is shown that the BCT prior predictive likelihood (the probability of a time series of observations averaged over all models and parameters) is both pointwise and minimax optimal, in agreement with the MDL principle and the BIC criterion. The posterior distribution is shown to be asymptotically consistent with probability one (over both models and parameters), and asymptotically Gaussian (over the parameters). And the posterior predictive distribution is also shown to be asymptotically consistent with probability one

arXiv.org e-Print Archive

Recommended from our members

The robust selection of predictive genes via a simple classifier

Author: Kellum P
Liu X
Tucker A
Vinciotti V
Publication venue: Adis International
Publication date: 01/01/2006
Field of study

Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially can lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as it is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for the robust identification of the most predictive features in such data. Within this framework, we propose a simple selective na¨ıve Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness to small sample sizes. An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown capable of identifying genes previously confirmed or associated with prostate cancer and viral infections

Brunel University Research Archive

Strong Asymptotic Assertions for Discrete MDL in Regression and Classification

Author: Jan Pol
Jan Pol
Marcus Hutter
Marcus Hutter
Publication venue
Publication date: 01/01/2005
Field of study

We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a "true" model contained in the class. This implies almost sure convergence of the predictive distribution to the true one at a fast rate. It corresponds to Solomonoff's central theorem of universal induction, however with a bound that is exponentially larger.Comment: 6 two-column page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University