225 research outputs found
New Results for the MAP Problem in Bayesian Networks
This paper presents new results for the (partial) maximum a posteriori (MAP)
problem in Bayesian networks, which is the problem of querying the most
probable state configuration of some of the network variables given evidence.
First, it is demonstrated that the problem remains hard even in networks with
very simple topology, such as binary polytrees and simple trees (including the
Naive Bayes structure). Such proofs extend previous complexity results for the
problem. Inapproximability results are also derived in the case of trees if the
number of states per variable is not bounded. Although the problem is shown to
be hard and inapproximable even in very simple scenarios, a new exact algorithm
is described that is empirically fast in networks of bounded treewidth and
bounded number of states per variable. The same algorithm is used as basis of a
Fully Polynomial Time Approximation Scheme for MAP under such assumptions.
Approximation schemes were generally thought to be impossible for this problem,
but we show otherwise for classes of networks that are important in practice.
The algorithms are extensively tested using some well-known networks as well as
random generated cases to show their effectiveness.Comment: A couple of typos were fixed, as well as the notation in part of
section 4, which was misleading. Theoretical and empirical results have not
change
Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks
We discuss the computational complexity of approximating maximum a posteriori
inference in sum-product networks. We first show NP-hardness in trees of height
two by a reduction from maximum independent set; this implies
non-approximability within a sublinear factor. We show that this is a tight
bound, as we can find an approximation within a linear factor in networks of
height two. We then show that, in trees of height three, it is NP-hard to
approximate the problem within a factor for any sublinear function
of the size of the input . Again, this bound is tight, as we prove that
the usual max-product algorithm finds (in any network) approximations within
factor for some constant . Last, we present a simple
algorithm, and show that it provably produces solutions at least as good as,
and potentially much better than, the max-product algorithm. We empirically
analyze the proposed algorithm against max-product using synthetic and
realistic networks.Comment: 18 page
On Pruning for Score-Based Bayesian Network Structure Learning
Many algorithms for score-based Bayesian network structure learning (BNSL),
in particular exact ones, take as input a collection of potentially optimal
parent sets for each variable in the data. Constructing such collections
naively is computationally intensive since the number of parent sets grows
exponentially with the number of variables. Thus, pruning techniques are not
only desirable but essential. While good pruning rules exist for the Bayesian
Information Criterion (BIC), current results for the Bayesian Dirichlet
equivalent uniform (BDeu) score reduce the search space very modestly,
hampering the use of the (often preferred) BDeu. We derive new non-trivial
theoretical upper bounds for the BDeu score that considerably improve on the
state-of-the-art. Since the new bounds are mathematically proven to be tighter
than previous ones and at little extra computational cost, they are a promising
addition to BNSL methods
Confidence Statements for Ordering Quantiles
This work proposes Quor, a simple yet effective nonparametric method to
compare independent samples with respect to corresponding quantiles of their
populations. The method is solely based on the order statistics of the samples,
and independence is its only requirement. All computations are performed using
exact distributions with no need for any asymptotic considerations, and yet can
be run using a fast quadratic-time dynamic programming idea. Computational
performance is essential in high-dimensional domains, such as gene expression
data. We describe the approach and discuss on the most important assumptions,
building a parallel with assumptions and properties of widely used techniques
for the same problem. Experiments using real data from biomedical studies are
performed to empirically compare Quor and other methods in a classification
task over a selection of high-dimensional data sets
Anytime Marginal MAP Inference
This paper presents a new anytime algorithm for the marginal MAP problem in
graphical models. The algorithm is described in detail, its complexity and
convergence rate are studied, and relations to previous theoretical results for
the problem are discussed. It is shown that the algorithm runs in
polynomial-time if the underlying graph of the model has bounded tree-width,
and that it provides guarantees to the lower and upper bounds obtained within a
fixed amount of computational resources. Experiments with both real and
synthetic generated models highlight its main characteristics and show that it
compares favorably against Park and Darwiche's systematic search, particularly
in the case of problems with many MAP variables and moderate tree-width.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Learning Bounded Treewidth Bayesian Networks with Thousands of Variables
We present a method for learning treewidth-bounded Bayesian networks from
data sets containing thousands of variables. Bounding the treewidth of a
Bayesian greatly reduces the complexity of inferences. Yet, being a global
property of the graph, it considerably increases the difficulty of the learning
process. We propose a novel algorithm for this task, able to scale to large
domains and large treewidths. Our novel approach consistently outperforms the
state of the art on data sets with up to ten thousand variables
Bayesian Dependence Tests for Continuous, Binary and Mixed Continuous-Binary Variables
Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis significance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artificial data, we show the effectiveness of the new tests
Advances in Learning Bayesian Networks of Bounded Treewidth
This work presents novel algorithms for learning Bayesian network structures
with bounded treewidth. Both exact and approximate methods are developed. The
exact method combines mixed-integer linear programming formulations for
structure learning and treewidth computation. The approximate method consists
in uniformly sampling -trees (maximal graphs of treewidth ), and
subsequently selecting, exactly or approximately, the best structure whose
moral graph is a subgraph of that -tree. Some properties of these methods
are discussed and proven. The approaches are empirically compared to each other
and to a state-of-the-art method for learning bounded treewidth structures on a
collection of public data sets with up to 100 variables. The experiments show
that our exact algorithm outperforms the state of the art, and that the
approximate approach is fairly accurate.Comment: 23 pages, 2 figures, 3 table
Learning Bayesian Networks with Incomplete Data by Augmentation
We present new algorithms for learning Bayesian networks from data with
missing values using a data augmentation approach. An exact Bayesian network
learning algorithm is obtained by recasting the problem into a standard
Bayesian network learning problem without missing data. To the best of our
knowledge, this is the first exact algorithm for this problem. As expected, the
exact algorithm does not scale to large domains. We build on the exact method
to create an approximate algorithm using a hill-climbing technique. This
algorithm scales to large domains so long as a suitable standard structure
learning method for complete data is available. We perform a wide range of
experiments to demonstrate the benefits of learning Bayesian networks with such
new approach
- …