10,561 research outputs found

    A categorical foundation for Bayesian probability

    Full text link
    Given two measurable spaces HH and DD with countably generated σ\sigma-algebras, a perfect prior probability measure PHP_H on HH and a sampling distribution S:HDS: H \rightarrow D, there is a corresponding inference map I:DHI: D \rightarrow H which is unique up to a set of measure zero. Thus, given a data measurement μ:1D\mu: 1 \rightarrow D, a posterior probability PH^=Iμ\widehat{P_H}= I \circ \mu can be computed. This procedure is iterative: with each updated probability PHP_H, we obtain a new joint distribution which in turn yields a new inference map II and the process repeats with each additional measurement. The main result uses an existence theorem for regular conditional probabilities by Faden, which holds in more generality than the setting of Polish spaces. This less stringent setting then allows for non-trivial decision rules (Eilenberg--Moore algebras) on finite (as well as non finite) spaces, and also provides for a common framework for decision theory and Bayesian probability.Comment: 15 pages; revised setting to more clearly explain how to incorporate perfect measures and the Giry monad; to appear in Applied Categorical Structure

    Interpretable statistics for complex modelling: quantile and topological learning

    Get PDF
    As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

    Uncertainty in phylogenetic tree estimates

    Full text link
    Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy and medicine. Although trees are estimated, their uncertainties are discarded by mathematicians working in tree space. Here we explicitly model the multivariate uncertainty of tree estimates. We consider both the cases where uncertainty information arises extrinsically (through covariate information) and intrinsically (through the tree estimates themselves). The importance of accounting for tree uncertainty in tree space is demonstrated in two case studies. In the first instance, differences between gene trees are small relative to their uncertainties, while in the second, the differences are relatively large. Our main goal is visualization of tree uncertainty, and we demonstrate advantages of our method with respect to reproducibility, speed and preservation of topological differences compared to visualization based on multidimensional scaling. The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded. Most importantly, it is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful, or due to uncertainty in estimation.Comment: Final version accepted to Journal of Computational and Graphical Statistic

    Picturing classical and quantum Bayesian inference

    Full text link
    We introduce a graphical framework for Bayesian inference that is sufficiently general to accommodate not just the standard case but also recent proposals for a theory of quantum Bayesian inference wherein one considers density operators rather than probability distributions as representative of degrees of belief. The diagrammatic framework is stated in the graphical language of symmetric monoidal categories and of compact structures and Frobenius structures therein, in which Bayesian inversion boils down to transposition with respect to an appropriate compact structure. We characterize classical Bayesian inference in terms of a graphical property and demonstrate that our approach eliminates some purely conventional elements that appear in common representations thereof, such as whether degrees of belief are represented by probabilities or entropic quantities. We also introduce a quantum-like calculus wherein the Frobenius structure is noncommutative and show that it can accommodate Leifer's calculus of `conditional density operators'. The notion of conditional independence is also generalized to our graphical setting and we make some preliminary connections to the theory of Bayesian networks. Finally, we demonstrate how to construct a graphical Bayesian calculus within any dagger compact category.Comment: 38 pages, lots of picture

    Diffusion Variational Autoencoders

    Full text link
    A standard Variational Autoencoder, with a Euclidean latent space, is structurally incapable of capturing topological properties of certain datasets. To remove topological obstructions, we introduce Diffusion Variational Autoencoders with arbitrary manifolds as a latent space. A Diffusion Variational Autoencoder uses transition kernels of Brownian motion on the manifold. In particular, it uses properties of the Brownian motion to implement the reparametrization trick and fast approximations to the KL divergence. We show that the Diffusion Variational Autoencoder is capable of capturing topological properties of synthetic datasets. Additionally, we train MNIST on spheres, tori, projective spaces, SO(3), and a torus embedded in R3. Although a natural dataset like MNIST does not have latent variables with a clear-cut topological structure, training it on a manifold can still highlight topological and geometrical properties.Comment: 10 pages, 8 figures Added an appendix with derivation of asymptotic expansion of KL divergence for heat kernel on arbitrary Riemannian manifolds, and an appendix with new experiments on binarized MNIST. Added a previously missing factor in the asymptotic expansion of the heat kernel and corrected a coefficient in asymptotic expansion KL divergence; further minor edit

    Inference In The Space Of Topological Maps: An MCMC-based Approach

    Get PDF
    ©2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Presented at the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 28 September-2 October 2004, Sendai, Japan.DOI: 10.1109/IROS.2004.1389611While probabilistic techniques have been considered extensively in the context of metric maps, no general purpose probabilistic methods exist for topological maps. We present the concept of Probabilistic Topological Maps (PTMs), a sample-based representation that approximates the posterior distribution over topologies given the available sensor measurements. The PTM is obtained through the use of MCMC-based Bayesian inference over the space of all possible topologies. It is shown that the space of all topologies is equivalent to the space of set partitions of all available measurements. While the space of possible topologies is intractably large, our use of Markov chain Monte Carlo sampling to infer the approximate histograms overcomes the combinatorial nature of this space and provides a general solution to the correspondence problem in the context of topological mapping. We present experimental results that validate our technique and generate good maps even when using only odometry as the sensor measurements
    corecore