159 research outputs found
An Empirical-Bayes Score for Discrete Bayesian Networks
Bayesian network structure learning is often performed in a Bayesian setting,
by evaluating candidate structures using their posterior probabilities for a
given data set. Score-based algorithms then use those posterior probabilities
as an objective function and return the maximum a posteriori network as the
learned model. For discrete Bayesian networks, the canonical choice for a
posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal
likelihood with a uniform (U) graph prior (Heckerman et al., 1995). Its
favourable theoretical properties descend from assuming a uniform prior both on
the space of the network structures and on the space of the parameters of the
network. In this paper, we revisit the limitations of these assumptions; and we
introduce an alternative set of assumptions and the resulting score: the
Bayesian Dirichlet sparse (BDs) empirical Bayes marginal likelihood with a
marginal uniform (MU) graph prior. We evaluate its performance in an extensive
simulation study, showing that MU+BDs is more accurate than U+BDeu both in
learning the structure of the network and in predicting new observations, while
not being computationally more complex to estimate.Comment: 12 pages, PGM 201
Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
A classic approach for learning Bayesian networks from data is to identify a
maximum a posteriori (MAP) network structure. In the case of discrete Bayesian
networks, MAP networks are selected by maximising one of several possible
Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet
equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties
of BDeu arise from its uniform prior over the parameters of each local
distribution in the network, which makes structure learning computationally
efficient; it does not require the elicitation of prior knowledge from experts;
and it satisfies score equivalence.
In this paper we will review the derivation and the properties of BD scores,
and of BDeu in particular, and we will link them to the corresponding entropy
estimates to study them from an information theoretic perspective. To this end,
we will work in the context of the foundational work of Giffin and Caticha
(2007), who showed that Bayesian inference can be framed as a particular case
of the maximum relative entropy principle. We will use this connection to show
that BDeu should not be used for structure learning from sparse data, since it
violates the maximum relative entropy principle; and that it is also
problematic from a more classic Bayesian model selection perspective, because
it produces Bayes factors that are sensitive to the value of its only
hyperparameter. Using a large simulation study, we found in our previous work
(Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide
better accuracy in structure learning; in this paper we further show that BDs
does not suffer from the issues above, and we recommend to use it for sparse
data instead of BDeu. Finally, will show that these issues are in fact
different aspects of the same problem and a consequence of the distributional
assumptions of the prior.Comment: 20 pages, 4 figures; extended version submitted to Behaviormetrik
A new hybrid method for Bayesian network learning With dependency constraints
A Bayes net has qualitative and quantitative aspects: The qualitative aspect is its graphical structure that corresponds to correlations among the variables in the Bayes net. The quantitative aspects are the net parameters. This paper develops a hybrid criterion for learning Bayes net structures that is based on both aspects. We combine model selection criteria measuring data fit with correlation information from statistical tests: Given a sample d, search for a structure G that maximizes score(G, d), over the set of structures G that satisfy the dependencies detected in d. We rely on the statistical test only to accept conditional dependencies, not conditional independencies. We show how to adapt local search algorithms to accommodate the observed dependencies. Simulation studies with GES search and the BDeu/BIC scores provide evidence that the additional dependency information leads to Bayes nets that better fit the target model in distribution and structure
On Pruning for Score-Based Bayesian Network Structure Learning
Many algorithms for score-based Bayesian network structure learning (BNSL),
in particular exact ones, take as input a collection of potentially optimal
parent sets for each variable in the data. Constructing such collections
naively is computationally intensive since the number of parent sets grows
exponentially with the number of variables. Thus, pruning techniques are not
only desirable but essential. While good pruning rules exist for the Bayesian
Information Criterion (BIC), current results for the Bayesian Dirichlet
equivalent uniform (BDeu) score reduce the search space very modestly,
hampering the use of the (often preferred) BDeu. We derive new non-trivial
theoretical upper bounds for the BDeu score that considerably improve on the
state-of-the-art. Since the new bounds are mathematically proven to be tighter
than previous ones and at little extra computational cost, they are a promising
addition to BNSL methods
Learning All Credible Bayesian Network Structures for Model Averaging
A Bayesian network is a widely used probabilistic graphical model with
applications in knowledge discovery and prediction. Learning a Bayesian network
(BN) from data can be cast as an optimization problem using the well-known
score-and-search approach. However, selecting a single model (i.e., the best
scoring BN) can be misleading or may not achieve the best possible accuracy. An
alternative to committing to a single model is to perform some form of Bayesian
or frequentist model averaging, where the space of possible BNs is sampled or
enumerated in some fashion. Unfortunately, existing approaches for model
averaging either severely restrict the structure of the Bayesian network or
have only been shown to scale to networks with fewer than 30 random variables.
In this paper, we propose a novel approach to model averaging inspired by
performance guarantees in approximation algorithms. Our approach has two
primary advantages. First, our approach only considers credible models in that
they are optimal or near-optimal in score. Second, our approach is more
efficient and scales to significantly larger Bayesian networks than existing
approaches.Comment: under review by JMLR. arXiv admin note: substantial text overlap with
arXiv:1811.0503
Exact Learning Augmented Naive Bayes Classifier
Earlier studies have shown that classification accuracies of Bayesian
networks (BNs) obtained by maximizing the conditional log likelihood (CLL) of a
class variable, given the feature variables, were higher than those obtained by
maximizing the marginal likelihood (ML). However, differences between the
performances of the two scores in the earlier studies may be attributed to the
fact that they used approximate learning algorithms, not exact ones. This paper
compares the classification accuracies of BNs with approximate learning using
CLL to those with exact learning using ML. The results demonstrate that the
classification accuracies of BNs obtained by maximizing the ML are higher than
those obtained by maximizing the CLL for large data. However, the results also
demonstrate that the classification accuracies of exact learning BNs using the
ML are much worse than those of other methods when the sample size is small and
the class variable has numerous parents. To resolve the problem, we propose an
exact learning augmented naive Bayes classifier (ANB), which ensures a class
variable with no parents. The proposed method is guaranteed to asymptotically
estimate the identical class posterior to that of the exactly learned BN.
Comparison experiments demonstrated the superior performance of the proposed
method.Comment: 29 page
Benchpress: a scalable and platform-independent workflow for benchmarking structure learning algorithms for graphical models
Describing the relationship between the variables in a study domain and
modelling the data generating mechanism is a fundamental problem in many
empirical sciences. Probabilistic graphical models are one common approach to
tackle the problem. Learning the graphical structure is computationally
challenging and a fervent area of current research with a plethora of
algorithms being developed. To facilitate the benchmarking of different
methods, we present a novel automated workflow, called benchpress for producing
scalable, reproducible, and platform-independent benchmarks of structure
learning algorithms for probabilistic graphical models. Benchpress is
interfaced via a simple JSON-file, which makes it accessible for all users,
while the code is designed in a fully modular fashion to enable researchers to
contribute additional methodologies. Benchpress currently provides an interface
to a large number of state-of-the-art algorithms from libraries such as BiDAG,
bnlearn, GOBNILP, pcalg, r.blip, scikit-learn, TETRAD, and trilearn as well as
a variety of methods for data generating models and performance evaluation.
Alongside user-defined models and randomly generated datasets, the software
tool also includes a number of standard datasets and graphical models from the
literature, which may be included in a benchmarking workflow. We demonstrate
the applicability of this workflow for learning Bayesian networks in four
typical data scenarios. The source code and documentation is publicly available
from http://github.com/felixleopoldo/benchpress.Comment: 30 pages, 1 figur
Learning Locally Minimax Optimal Bayesian Networks
We consider the problem of learning Bayesian network models in a non-informative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a set of independence relations), and learning the parameters of the model (that fix the probability distribution from the set of all distributions consistent with the chosen structure). There are not many theoretical frameworks that consistently handle both these problems together, the Bayesian framework being an exception. In this paper we propose an alternative, information-theoretic framework which sidesteps some of the technical problems facing the Bayesian approach. The framework is based on the minimax-optimal Normalized Maximum Likelihood (NML) distribution, which is motivated by the Minimum Description Length (MDL) principle. The resulting model selection criterion is consistent, and it provides a way to construct highly predictive Bayesian network models. Our empirical tests show that the proposed method compares favorably with alternative approaches in both model selection and prediction tasks.
- …