8,389 research outputs found

    Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle

    Full text link
    A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.Comment: 20 pages, 4 figures; extended version submitted to Behaviormetrik

    Combinatorial Information Theory: I. Philosophical Basis of Cross-Entropy and Entropy

    Full text link
    This study critically analyses the information-theoretic, axiomatic and combinatorial philosophical bases of the entropy and cross-entropy concepts. The combinatorial basis is shown to be the most fundamental (most primitive) of these three bases, since it gives (i) a derivation for the Kullback-Leibler cross-entropy and Shannon entropy functions, as simplified forms of the multinomial distribution subject to the Stirling approximation; (ii) an explanation for the need to maximize entropy (or minimize cross-entropy) to find the most probable realization; and (iii) new, generalized definitions of entropy and cross-entropy - supersets of the Boltzmann principle - applicable to non-multinomial systems. The combinatorial basis is therefore of much broader scope, with far greater power of application, than the information-theoretic and axiomatic bases. The generalized definitions underpin a new discipline of ``{\it combinatorial information theory}'', for the analysis of probabilistic systems of any type. Jaynes' generic formulation of statistical mechanics for multinomial systems is re-examined in light of the combinatorial approach. (abbreviated abstract)Comment: 45 pp; 1 figure; REVTex; updated version 5 (incremental changes

    How to estimate the differential acceleration in a two-species atom interferometer to test the equivalence principle

    Full text link
    We propose a scheme for testing the weak equivalence principle (Universality of Free Fall) using an atom-interferometric measurement of the local differential acceleration between two atomic species with a large mass ratio as test masses. A apparatus in free fall can be used to track atomic free-fall trajectories over large distances. We show how the differential acceleration can be extracted from the interferometric signal using Bayesian statistical estimation, even in the case of a large mass and laser wavelength difference. We show that this statistical estimation method does not suffer from acceleration noise of the platform and does not require repeatable experimental conditions. We specialize our discussion to a dual potassium/rubidium interferometer and extend our protocol with other atomic mixtures. Finally, we discuss the performances of the UFF test developed for the free-fall (0-g) airplane in the ICE project (\verb"http://www.ice-space.fr"

    Learning the Irreducible Representations of Commutative Lie Groups

    Get PDF
    We present a new probabilistic model of compact commutative Lie groups that produces invariant-equivariant and disentangled representations of data. To define the notion of disentangling, we borrow a fundamental principle from physics that is used to derive the elementary particles of a system from its symmetries. Our model employs a newfound Bayesian conjugacy relation that enables fully tractable probabilistic inference over compact commutative Lie groups -- a class that includes the groups that describe the rotation and cyclic translation of images. We train the model on pairs of transformed image patches, and show that the learned invariant representation is highly effective for classification

    Bayesian reconstruction of the cosmological large-scale structure: methodology, inverse algorithms and numerical optimization

    Full text link
    We address the inverse problem of cosmic large-scale structure reconstruction from a Bayesian perspective. For a linear data model, a number of known and novel reconstruction schemes, which differ in terms of the underlying signal prior, data likelihood, and numerical inverse extra-regularization schemes are derived and classified. The Bayesian methodology presented in this paper tries to unify and extend the following methods: Wiener-filtering, Tikhonov regularization, Ridge regression, Maximum Entropy, and inverse regularization techniques. The inverse techniques considered here are the asymptotic regularization, the Jacobi, Steepest Descent, Newton-Raphson, Landweber-Fridman, and both linear and non-linear Krylov methods based on Fletcher-Reeves, Polak-Ribiere, and Hestenes-Stiefel Conjugate Gradients. The structures of the up-to-date highest-performing algorithms are presented, based on an operator scheme, which permits one to exploit the power of fast Fourier transforms. Using such an implementation of the generalized Wiener-filter in the novel ARGO-software package, the different numerical schemes are benchmarked with 1-, 2-, and 3-dimensional problems including structured white and Poissonian noise, data windowing and blurring effects. A novel numerical Krylov scheme is shown to be superior in terms of performance and fidelity. These fast inverse methods ultimately will enable the application of sampling techniques to explore complex joint posterior distributions. We outline how the space of the dark-matter density field, the peculiar velocity field, and the power spectrum can jointly be investigated by a Gibbs-sampling process. Such a method can be applied for the redshift distortions correction of the observed galaxies and for time-reversal reconstructions of the initial density field.Comment: 40 pages, 11 figure

    Model Selection Principles in Misspecified Models

    Full text link
    Model selection is of fundamental importance to high dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Kullback-Leibler divergence principle and the Bayesian principle, which lead to the Akaike information criterion and Bayesian information criterion when models are correctly specified. Yet model misspecification is unavoidable when we have no knowledge of the true model or when we have the correct family of distributions but miss some true predictor. In this paper, we propose a family of semi-Bayesian principles for model selection in misspecified models, which combine the strengths of the two well-known principles. We derive asymptotic expansions of the semi-Bayesian principles in misspecified generalized linear models, which give the new semi-Bayesian information criteria (SIC). A specific form of SIC admits a natural decomposition into the negative maximum quasi-log-likelihood, a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the newly proposed SIC methodology for model selection in both correctly specified and misspecified models.Comment: 25 pages, 6 table

    Robust adaptive beamforming using a Bayesian steering vector error model

    Get PDF
    We propose a Bayesian approach to robust adaptive beamforming which entails considering the steering vector of interest as a random variable with some prior distribution. The latter can be tuned in a simple way to reflect how far is the actual steering vector from its presumed value. Two different priors are proposed, namely a Bingham prior distribution and a distribution that directly reveals and depends upon the angle between the true and presumed steering vector. Accordingly, a non-informative prior is assigned to the interference plus noise covariance matrix R, which can be viewed as a means to introduce diagonal loading in a Bayesian framework. The minimum mean square distance estimate of the steering vector as well as the minimum mean square error estimate of R are derived and implemented using a Gibbs sampling strategy. Numerical simulations show that the new beamformers possess a very good rate of convergence even in the presence of steering vector errors
    corecore