8,389 research outputs found
Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
A classic approach for learning Bayesian networks from data is to identify a
maximum a posteriori (MAP) network structure. In the case of discrete Bayesian
networks, MAP networks are selected by maximising one of several possible
Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet
equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties
of BDeu arise from its uniform prior over the parameters of each local
distribution in the network, which makes structure learning computationally
efficient; it does not require the elicitation of prior knowledge from experts;
and it satisfies score equivalence.
In this paper we will review the derivation and the properties of BD scores,
and of BDeu in particular, and we will link them to the corresponding entropy
estimates to study them from an information theoretic perspective. To this end,
we will work in the context of the foundational work of Giffin and Caticha
(2007), who showed that Bayesian inference can be framed as a particular case
of the maximum relative entropy principle. We will use this connection to show
that BDeu should not be used for structure learning from sparse data, since it
violates the maximum relative entropy principle; and that it is also
problematic from a more classic Bayesian model selection perspective, because
it produces Bayes factors that are sensitive to the value of its only
hyperparameter. Using a large simulation study, we found in our previous work
(Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide
better accuracy in structure learning; in this paper we further show that BDs
does not suffer from the issues above, and we recommend to use it for sparse
data instead of BDeu. Finally, will show that these issues are in fact
different aspects of the same problem and a consequence of the distributional
assumptions of the prior.Comment: 20 pages, 4 figures; extended version submitted to Behaviormetrik
Combinatorial Information Theory: I. Philosophical Basis of Cross-Entropy and Entropy
This study critically analyses the information-theoretic, axiomatic and
combinatorial philosophical bases of the entropy and cross-entropy concepts.
The combinatorial basis is shown to be the most fundamental (most primitive) of
these three bases, since it gives (i) a derivation for the Kullback-Leibler
cross-entropy and Shannon entropy functions, as simplified forms of the
multinomial distribution subject to the Stirling approximation; (ii) an
explanation for the need to maximize entropy (or minimize cross-entropy) to
find the most probable realization; and (iii) new, generalized definitions of
entropy and cross-entropy - supersets of the Boltzmann principle - applicable
to non-multinomial systems. The combinatorial basis is therefore of much
broader scope, with far greater power of application, than the
information-theoretic and axiomatic bases. The generalized definitions underpin
a new discipline of ``{\it combinatorial information theory}'', for the
analysis of probabilistic systems of any type.
Jaynes' generic formulation of statistical mechanics for multinomial systems
is re-examined in light of the combinatorial approach. (abbreviated abstract)Comment: 45 pp; 1 figure; REVTex; updated version 5 (incremental changes
How to estimate the differential acceleration in a two-species atom interferometer to test the equivalence principle
We propose a scheme for testing the weak equivalence principle (Universality
of Free Fall) using an atom-interferometric measurement of the local
differential acceleration between two atomic species with a large mass ratio as
test masses. A apparatus in free fall can be used to track atomic free-fall
trajectories over large distances. We show how the differential acceleration
can be extracted from the interferometric signal using Bayesian statistical
estimation, even in the case of a large mass and laser wavelength difference.
We show that this statistical estimation method does not suffer from
acceleration noise of the platform and does not require repeatable experimental
conditions. We specialize our discussion to a dual potassium/rubidium
interferometer and extend our protocol with other atomic mixtures. Finally, we
discuss the performances of the UFF test developed for the free-fall (0-g)
airplane in the ICE project (\verb"http://www.ice-space.fr"
Learning the Irreducible Representations of Commutative Lie Groups
We present a new probabilistic model of compact commutative Lie groups that
produces invariant-equivariant and disentangled representations of data. To
define the notion of disentangling, we borrow a fundamental principle from
physics that is used to derive the elementary particles of a system from its
symmetries. Our model employs a newfound Bayesian conjugacy relation that
enables fully tractable probabilistic inference over compact commutative Lie
groups -- a class that includes the groups that describe the rotation and
cyclic translation of images. We train the model on pairs of transformed image
patches, and show that the learned invariant representation is highly effective
for classification
Bayesian reconstruction of the cosmological large-scale structure: methodology, inverse algorithms and numerical optimization
We address the inverse problem of cosmic large-scale structure reconstruction
from a Bayesian perspective. For a linear data model, a number of known and
novel reconstruction schemes, which differ in terms of the underlying signal
prior, data likelihood, and numerical inverse extra-regularization schemes are
derived and classified. The Bayesian methodology presented in this paper tries
to unify and extend the following methods: Wiener-filtering, Tikhonov
regularization, Ridge regression, Maximum Entropy, and inverse regularization
techniques. The inverse techniques considered here are the asymptotic
regularization, the Jacobi, Steepest Descent, Newton-Raphson,
Landweber-Fridman, and both linear and non-linear Krylov methods based on
Fletcher-Reeves, Polak-Ribiere, and Hestenes-Stiefel Conjugate Gradients. The
structures of the up-to-date highest-performing algorithms are presented, based
on an operator scheme, which permits one to exploit the power of fast Fourier
transforms. Using such an implementation of the generalized Wiener-filter in
the novel ARGO-software package, the different numerical schemes are
benchmarked with 1-, 2-, and 3-dimensional problems including structured white
and Poissonian noise, data windowing and blurring effects. A novel numerical
Krylov scheme is shown to be superior in terms of performance and fidelity.
These fast inverse methods ultimately will enable the application of sampling
techniques to explore complex joint posterior distributions. We outline how the
space of the dark-matter density field, the peculiar velocity field, and the
power spectrum can jointly be investigated by a Gibbs-sampling process. Such a
method can be applied for the redshift distortions correction of the observed
galaxies and for time-reversal reconstructions of the initial density field.Comment: 40 pages, 11 figure
Model Selection Principles in Misspecified Models
Model selection is of fundamental importance to high dimensional modeling
featured in many contemporary applications. Classical principles of model
selection include the Kullback-Leibler divergence principle and the Bayesian
principle, which lead to the Akaike information criterion and Bayesian
information criterion when models are correctly specified. Yet model
misspecification is unavoidable when we have no knowledge of the true model or
when we have the correct family of distributions but miss some true predictor.
In this paper, we propose a family of semi-Bayesian principles for model
selection in misspecified models, which combine the strengths of the two
well-known principles. We derive asymptotic expansions of the semi-Bayesian
principles in misspecified generalized linear models, which give the new
semi-Bayesian information criteria (SIC). A specific form of SIC admits a
natural decomposition into the negative maximum quasi-log-likelihood, a penalty
on model dimensionality, and a penalty on model misspecification directly.
Numerical studies demonstrate the advantage of the newly proposed SIC
methodology for model selection in both correctly specified and misspecified
models.Comment: 25 pages, 6 table
Robust adaptive beamforming using a Bayesian steering vector error model
We propose a Bayesian approach to robust adaptive beamforming which entails considering the steering vector of interest as a random variable with some prior distribution. The latter can be tuned in a simple way to reflect how far is the actual steering vector from its presumed value. Two different priors are proposed, namely a Bingham prior distribution and a distribution that directly reveals and depends upon the angle between the true and presumed steering vector. Accordingly, a non-informative prior is assigned to the interference plus noise covariance matrix R, which can be viewed as a means to introduce diagonal loading in a Bayesian framework. The minimum mean square distance estimate of the steering vector as well as the minimum mean square error estimate of R are derived and implemented using a Gibbs sampling strategy. Numerical simulations show that the new beamformers possess a very good rate of convergence even in the presence of steering vector errors
- …