3,340 research outputs found
Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
A classic approach for learning Bayesian networks from data is to identify a
maximum a posteriori (MAP) network structure. In the case of discrete Bayesian
networks, MAP networks are selected by maximising one of several possible
Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet
equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties
of BDeu arise from its uniform prior over the parameters of each local
distribution in the network, which makes structure learning computationally
efficient; it does not require the elicitation of prior knowledge from experts;
and it satisfies score equivalence.
In this paper we will review the derivation and the properties of BD scores,
and of BDeu in particular, and we will link them to the corresponding entropy
estimates to study them from an information theoretic perspective. To this end,
we will work in the context of the foundational work of Giffin and Caticha
(2007), who showed that Bayesian inference can be framed as a particular case
of the maximum relative entropy principle. We will use this connection to show
that BDeu should not be used for structure learning from sparse data, since it
violates the maximum relative entropy principle; and that it is also
problematic from a more classic Bayesian model selection perspective, because
it produces Bayes factors that are sensitive to the value of its only
hyperparameter. Using a large simulation study, we found in our previous work
(Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide
better accuracy in structure learning; in this paper we further show that BDs
does not suffer from the issues above, and we recommend to use it for sparse
data instead of BDeu. Finally, will show that these issues are in fact
different aspects of the same problem and a consequence of the distributional
assumptions of the prior.Comment: 20 pages, 4 figures; extended version submitted to Behaviormetrik
On Pruning for Score-Based Bayesian Network Structure Learning
Many algorithms for score-based Bayesian network structure learning (BNSL),
in particular exact ones, take as input a collection of potentially optimal
parent sets for each variable in the data. Constructing such collections
naively is computationally intensive since the number of parent sets grows
exponentially with the number of variables. Thus, pruning techniques are not
only desirable but essential. While good pruning rules exist for the Bayesian
Information Criterion (BIC), current results for the Bayesian Dirichlet
equivalent uniform (BDeu) score reduce the search space very modestly,
hampering the use of the (often preferred) BDeu. We derive new non-trivial
theoretical upper bounds for the BDeu score that considerably improve on the
state-of-the-art. Since the new bounds are mathematically proven to be tighter
than previous ones and at little extra computational cost, they are a promising
addition to BNSL methods
An Empirical-Bayes Score for Discrete Bayesian Networks
Bayesian network structure learning is often performed in a Bayesian setting,
by evaluating candidate structures using their posterior probabilities for a
given data set. Score-based algorithms then use those posterior probabilities
as an objective function and return the maximum a posteriori network as the
learned model. For discrete Bayesian networks, the canonical choice for a
posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal
likelihood with a uniform (U) graph prior (Heckerman et al., 1995). Its
favourable theoretical properties descend from assuming a uniform prior both on
the space of the network structures and on the space of the parameters of the
network. In this paper, we revisit the limitations of these assumptions; and we
introduce an alternative set of assumptions and the resulting score: the
Bayesian Dirichlet sparse (BDs) empirical Bayes marginal likelihood with a
marginal uniform (MU) graph prior. We evaluate its performance in an extensive
simulation study, showing that MU+BDs is more accurate than U+BDeu both in
learning the structure of the network and in predicting new observations, while
not being computationally more complex to estimate.Comment: 12 pages, PGM 201
Bayesian Learning of Sum-Product Networks
Sum-product networks (SPNs) are flexible density estimators and have received
significant attention due to their attractive inference properties. While
parameter learning in SPNs is well developed, structure learning leaves
something to be desired: Even though there is a plethora of SPN structure
learners, most of them are somewhat ad-hoc and based on intuition rather than a
clear learning principle. In this paper, we introduce a well-principled
Bayesian framework for SPN structure learning. First, we decompose the problem
into i) laying out a computational graph, and ii) learning the so-called scope
function over the graph. The first is rather unproblematic and akin to neural
network architecture validation. The second represents the effective structure
of the SPN and needs to respect the usual structural constraints in SPN, i.e.
completeness and decomposability. While representing and learning the scope
function is somewhat involved in general, in this paper, we propose a natural
parametrisation for an important and widely used special case of SPNs. These
structural parameters are incorporated into a Bayesian model, such that
simultaneous structure and parameter learning is cast into monolithic Bayesian
posterior inference. In various experiments, our Bayesian SPNs often improve
test likelihoods over greedy SPN learners. Further, since the Bayesian
framework protects against overfitting, we can evaluate hyper-parameters
directly on the Bayesian model score, waiving the need for a separate
validation set, which is especially beneficial in low data regimes. Bayesian
SPNs can be applied to heterogeneous domains and can easily be extended to
nonparametric formulations. Moreover, our Bayesian approach is the first, which
consistently and robustly learns SPN structures under missing data.Comment: NeurIPS 2019; See conference page for supplemen
- …