    Inference by Minimizing Size, Divergence, or their Sum

    We speed up marginal inference by ignoring factors that do not significantly contribute to overall accuracy. In order to pick a suitable subset of factors to ignore, we propose three schemes: minimizing the number of model factors under a bound on the KL divergence between pruned and full models; minimizing the KL divergence under a bound on factor count; and minimizing the weighted sum of KL divergence and factor count. All three problems are solved using an approximation of the KL divergence than can be calculated in terms of marginals computed on a simple seed graph. Applied to synthetic image denoising and to three different types of NLP parsing models, this technique performs marginal inference up to 11 times faster than loopy BP, with graph sizes reduced up to 98%-at comparable error in marginals and parsing accuracy. We also show that minimizing the weighted sum of divergence and size is substantially faster than minimizing either of the other objectives based on the approximation to divergence presented here.Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010

    Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets

    Nonnegative Matrix Factorization (NMF) with Kullback-Leibler Divergence (NMF-KL) is one of the most significant NMF problems and equivalent to Probabilistic Latent Semantic Indexing (PLSI), which has been successfully applied in many applications. For sparse count data, a Poisson distribution and KL divergence provide sparse models and sparse representation, which describe the random variation better than a normal distribution and Frobenius norm. Specially, sparse models provide more concise understanding of the appearance of attributes over latent components, while sparse representation provides concise interpretability of the contribution of latent components over instances. However, minimizing NMF with KL divergence is much more difficult than minimizing NMF with Frobenius norm; and sparse models, sparse representation and fast algorithms for large sparse datasets are still challenges for NMF with KL divergence. In this paper, we propose a fast parallel randomized coordinate descent algorithm having fast convergence for large sparse datasets to archive sparse models and sparse representation. The proposed algorithm's experimental results overperform the current studies' ones in this problem

    Towards efficient music genre classification using FastMap

    Automatic genre classification aims to correctly categorize an unknown recording with a music genre. Recent studies use the Kullback-Leibler (KL) divergence to estimate music similarity then perform classification using k-nearest neighbours (k-NN). However, this approach is not practical for large databases. We propose an efficient genre classifier that addresses the scalability problem. It uses a combination of modified FastMap algorithm and KL divergence to return the nearest neighbours then use 1- NN for classification. Our experiments showed that high accuracies are obtained while performing classification in less than 1/20 second per track


    The Kullback-Leibler (KL) divergence is one of the most fundamental metrics in information theory and statistics and provides various operational interpretations in the context of mathematical communication theory and statistical hypothesis testing. The KL divergence for discrete distributions has the desired continuity property which leads to some fundamental results in universal hypothesis testing. With continuous observations, however, the KL divergence is only lower semi-continuous; difficulties arise when tackling universal hypothesis testing with continuous observations due to the lack of continuity in KL divergence. This dissertation proposes a robust version of the KL divergence for continuous alphabets. Specifically, the KL divergence defined from a distribution to the Levy ball centered at the other distribution is found to be continuous. This robust version of the KL divergence allows one to generalize the result in universal hypothesis testing for discrete alphabets to that for continuous observations. The optimal decision rule is developed whose robust property is provably established for universal hypothesis testing. Another application of the robust KL divergence is in deviation detection: the problem of detecting deviation from a nominal distribution using a sequence of independent and identically distributed observations. An asymptotically -optimal detector is then developed for deviation detection where the Levy metric becomes a very natural distance measure for deviation from the nominal distribution. Lastly, the dissertation considers the following variation of a distributed detection problem: a sensor may overhear other sensors\u27 transmissions and thus may choose to refine its output in the hope of achieving a better detection performance. While this is shown to be possible for the fixed sample size test, asymptotically (in the number of samples) there is no performance gain, as measured by the KL divergence achievable at the fusion center, provided that the observations are conditionally independent. For conditionally dependent observations, however, asymptotic detection performance may indeed be improved when overhearing is utilized

    Convergence of Langevin MCMC in KL-divergence

    Langevin diffusion is a commonly used tool for sampling from a given distribution. In this work, we establish that when the target density pp^* is such that logp\log p^* is LL smooth and mm strongly convex, discrete Langevin diffusion produces a distribution pp with KL(pp)ϵKL(p||p^*)\leq \epsilon in O~(dϵ)\tilde{O}(\frac{d}{\epsilon}) steps, where dd is the dimension of the sample space. We also study the convergence rate when the strong-convexity assumption is absent. By considering the Langevin diffusion as a gradient flow in the space of probability distributions, we obtain an elegant analysis that applies to the stronger property of convergence in KL-divergence and gives a conceptually simpler proof of the best-known convergence results in weaker metrics

    Diffusion Variational Autoencoders

    Full text link
    A standard Variational Autoencoder, with a Euclidean latent space, is structurally incapable of capturing topological properties of certain datasets. To remove topological obstructions, we introduce Diffusion Variational Autoencoders with arbitrary manifolds as a latent space. A Diffusion Variational Autoencoder uses transition kernels of Brownian motion on the manifold. In particular, it uses properties of the Brownian motion to implement the reparametrization trick and fast approximations to the KL divergence. We show that the Diffusion Variational Autoencoder is capable of capturing topological properties of synthetic datasets. Additionally, we train MNIST on spheres, tori, projective spaces, SO(3), and a torus embedded in R3. Although a natural dataset like MNIST does not have latent variables with a clear-cut topological structure, training it on a manifold can still highlight topological and geometrical properties.Comment: 10 pages, 8 figures Added an appendix with derivation of asymptotic expansion of KL divergence for heat kernel on arbitrary Riemannian manifolds, and an appendix with new experiments on binarized MNIST. Added a previously missing factor in the asymptotic expansion of the heat kernel and corrected a coefficient in asymptotic expansion KL divergence; further minor edit