1,619 research outputs found
Bayesian Structural Inference for Hidden Processes
We introduce a Bayesian approach to discovering patterns in structurally
complex processes. The proposed method of Bayesian Structural Inference (BSI)
relies on a set of candidate unifilar HMM (uHMM) topologies for inference of
process structure from a data series. We employ a recently developed exact
enumeration of topological epsilon-machines. (A sequel then removes the
topological restriction.) This subset of the uHMM topologies has the added
benefit that inferred models are guaranteed to be epsilon-machines,
irrespective of estimated transition probabilities. Properties of
epsilon-machines and uHMMs allow for the derivation of analytic expressions for
estimating transition probabilities, inferring start states, and comparing the
posterior probability of candidate model topologies, despite process internal
structure being only indirectly present in data. We demonstrate BSI's
effectiveness in estimating a process's randomness, as reflected by the Shannon
entropy rate, and its structure, as quantified by the statistical complexity.
We also compare using the posterior distribution over candidate models and the
single, maximum a posteriori model for point estimation and show that the
former more accurately reflects uncertainty in estimated values. We apply BSI
to in-class examples of finite- and infinite-order Markov processes, as well to
an out-of-class, infinite-state hidden process.Comment: 20 pages, 11 figures, 1 table; supplementary materials, 15 pages, 11
figures, 6 tables; http://csc.ucdavis.edu/~cmg/compmech/pubs/bsihp.ht
Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
A classic approach for learning Bayesian networks from data is to identify a
maximum a posteriori (MAP) network structure. In the case of discrete Bayesian
networks, MAP networks are selected by maximising one of several possible
Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet
equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties
of BDeu arise from its uniform prior over the parameters of each local
distribution in the network, which makes structure learning computationally
efficient; it does not require the elicitation of prior knowledge from experts;
and it satisfies score equivalence.
In this paper we will review the derivation and the properties of BD scores,
and of BDeu in particular, and we will link them to the corresponding entropy
estimates to study them from an information theoretic perspective. To this end,
we will work in the context of the foundational work of Giffin and Caticha
(2007), who showed that Bayesian inference can be framed as a particular case
of the maximum relative entropy principle. We will use this connection to show
that BDeu should not be used for structure learning from sparse data, since it
violates the maximum relative entropy principle; and that it is also
problematic from a more classic Bayesian model selection perspective, because
it produces Bayes factors that are sensitive to the value of its only
hyperparameter. Using a large simulation study, we found in our previous work
(Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide
better accuracy in structure learning; in this paper we further show that BDs
does not suffer from the issues above, and we recommend to use it for sparse
data instead of BDeu. Finally, will show that these issues are in fact
different aspects of the same problem and a consequence of the distributional
assumptions of the prior.Comment: 20 pages, 4 figures; extended version submitted to Behaviormetrik
Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and Consistency
The Bayesian formulation of inverse problems is attractive for three primary
reasons: it provides a clear modelling framework; means for uncertainty
quantification; and it allows for principled learning of hyperparameters. The
posterior distribution may be explored by sampling methods, but for many
problems it is computationally infeasible to do so. In this situation maximum a
posteriori (MAP) estimators are often sought. Whilst these are relatively cheap
to compute, and have an attractive variational formulation, a key drawback is
their lack of invariance under change of parameterization. This is a
particularly significant issue when hierarchical priors are employed to learn
hyperparameters. In this paper we study the effect of the choice of
parameterization on MAP estimators when a conditionally Gaussian hierarchical
prior distribution is employed. Specifically we consider the centred
parameterization, the natural parameterization in which the unknown state is
solved for directly, and the noncentred parameterization, which works with a
whitened Gaussian as the unknown state variable, and arises when considering
dimension-robust MCMC algorithms; MAP estimation is well-defined in the
nonparametric setting only for the noncentred parameterization. However, we
show that MAP estimates based on the noncentred parameterization are not
consistent as estimators of hyperparameters; conversely, we show that limits of
finite-dimensional centred MAP estimators are consistent as the dimension tends
to infinity. We also consider empirical Bayesian hyperparameter estimation,
show consistency of these estimates, and demonstrate that they are more robust
with respect to noise than centred MAP estimates. An underpinning concept
throughout is that hyperparameters may only be recovered up to measure
equivalence, a well-known phenomenon in the context of the Ornstein-Uhlenbeck
process.Comment: 36 pages, 8 figure
An Empirical-Bayes Score for Discrete Bayesian Networks
Bayesian network structure learning is often performed in a Bayesian setting,
by evaluating candidate structures using their posterior probabilities for a
given data set. Score-based algorithms then use those posterior probabilities
as an objective function and return the maximum a posteriori network as the
learned model. For discrete Bayesian networks, the canonical choice for a
posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal
likelihood with a uniform (U) graph prior (Heckerman et al., 1995). Its
favourable theoretical properties descend from assuming a uniform prior both on
the space of the network structures and on the space of the parameters of the
network. In this paper, we revisit the limitations of these assumptions; and we
introduce an alternative set of assumptions and the resulting score: the
Bayesian Dirichlet sparse (BDs) empirical Bayes marginal likelihood with a
marginal uniform (MU) graph prior. We evaluate its performance in an extensive
simulation study, showing that MU+BDs is more accurate than U+BDeu both in
learning the structure of the network and in predicting new observations, while
not being computationally more complex to estimate.Comment: 12 pages, PGM 201
Log-Concave Duality in Estimation and Control
In this paper we generalize the estimation-control duality that exists in the
linear-quadratic-Gaussian setting. We extend this duality to maximum a
posteriori estimation of the system's state, where the measurement and
dynamical system noise are independent log-concave random variables. More
generally, we show that a problem which induces a convex penalty on noise terms
will have a dual control problem. We provide conditions for strong duality to
hold, and then prove relaxed conditions for the piecewise linear-quadratic
case. The results have applications in estimation problems with nonsmooth
densities, such as log-concave maximum likelihood densities. We conclude with
an example reconstructing optimal estimates from solutions to the dual control
problem, which has implications for sharing solution methods between the two
types of problems
- …