27,256 research outputs found
Inference by Minimizing Size, Divergence, or their Sum
We speed up marginal inference by ignoring factors that do not significantly
contribute to overall accuracy. In order to pick a suitable subset of factors
to ignore, we propose three schemes: minimizing the number of model factors
under a bound on the KL divergence between pruned and full models; minimizing
the KL divergence under a bound on factor count; and minimizing the weighted
sum of KL divergence and factor count. All three problems are solved using an
approximation of the KL divergence than can be calculated in terms of marginals
computed on a simple seed graph. Applied to synthetic image denoising and to
three different types of NLP parsing models, this technique performs marginal
inference up to 11 times faster than loopy BP, with graph sizes reduced up to
98%-at comparable error in marginals and parsing accuracy. We also show that
minimizing the weighted sum of divergence and size is substantially faster than
minimizing either of the other objectives based on the approximation to
divergence presented here.Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty
in Artificial Intelligence (UAI2010
Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets
Nonnegative Matrix Factorization (NMF) with Kullback-Leibler Divergence
(NMF-KL) is one of the most significant NMF problems and equivalent to
Probabilistic Latent Semantic Indexing (PLSI), which has been successfully
applied in many applications. For sparse count data, a Poisson distribution and
KL divergence provide sparse models and sparse representation, which describe
the random variation better than a normal distribution and Frobenius norm.
Specially, sparse models provide more concise understanding of the appearance
of attributes over latent components, while sparse representation provides
concise interpretability of the contribution of latent components over
instances. However, minimizing NMF with KL divergence is much more difficult
than minimizing NMF with Frobenius norm; and sparse models, sparse
representation and fast algorithms for large sparse datasets are still
challenges for NMF with KL divergence. In this paper, we propose a fast
parallel randomized coordinate descent algorithm having fast convergence for
large sparse datasets to archive sparse models and sparse representation. The
proposed algorithm's experimental results overperform the current studies' ones
in this problem
Towards efficient music genre classification using FastMap
Automatic genre classification aims to correctly categorize an unknown recording with a music genre. Recent studies use the Kullback-Leibler (KL) divergence to estimate music similarity then perform classification using k-nearest neighbours (k-NN). However, this approach is not practical for large databases. We propose an efficient genre classifier that addresses the scalability problem. It uses a combination of modified FastMap algorithm and KL divergence to return the nearest neighbours then use 1- NN for classification. Our experiments showed that high accuracies are obtained while performing classification in less than 1/20 second per track
ROBUST KULLBACK-LEIBLER DIVERGENCE AND ITS APPLICATIONS IN UNIVERSAL HYPOTHESIS TESTING AND DEVIATION DETECTION
The Kullback-Leibler (KL) divergence is one of the most fundamental metrics in information theory and statistics and provides various operational interpretations in the context of mathematical communication theory and statistical hypothesis testing. The KL divergence for discrete distributions has the desired continuity property which leads to some fundamental results in universal hypothesis testing. With continuous observations, however, the KL divergence is only lower semi-continuous; difficulties arise when tackling universal hypothesis testing with continuous observations due to the lack of continuity in KL divergence.
This dissertation proposes a robust version of the KL divergence for continuous alphabets. Specifically, the KL divergence defined from a distribution to the Levy ball centered at the other distribution is found to be continuous. This robust version of the KL divergence allows one to generalize the result in universal hypothesis testing for discrete alphabets to that for continuous observations. The optimal decision rule is developed whose robust property is provably established for universal hypothesis testing.
Another application of the robust KL divergence is in deviation detection: the problem of detecting deviation from a nominal distribution using a sequence of independent and identically distributed observations. An asymptotically -optimal detector is then developed for deviation detection where the Levy metric becomes a very natural distance measure for deviation from the nominal distribution.
Lastly, the dissertation considers the following variation of a distributed detection problem: a sensor may overhear other sensors\u27 transmissions and thus may choose to refine its output in the hope of achieving a better detection performance. While this is shown to be possible for the fixed sample size test, asymptotically (in the number of samples) there is no performance gain, as measured by the KL divergence achievable at the fusion center, provided that the observations are conditionally independent. For conditionally dependent observations, however, asymptotic detection performance may indeed be improved when overhearing is utilized
Convergence of Langevin MCMC in KL-divergence
Langevin diffusion is a commonly used tool for sampling from a given
distribution. In this work, we establish that when the target density is
such that is smooth and strongly convex, discrete Langevin
diffusion produces a distribution with in
steps, where is the dimension of the sample
space. We also study the convergence rate when the strong-convexity assumption
is absent. By considering the Langevin diffusion as a gradient flow in the
space of probability distributions, we obtain an elegant analysis that applies
to the stronger property of convergence in KL-divergence and gives a
conceptually simpler proof of the best-known convergence results in weaker
metrics
Diffusion Variational Autoencoders
A standard Variational Autoencoder, with a Euclidean latent space, is
structurally incapable of capturing topological properties of certain datasets.
To remove topological obstructions, we introduce Diffusion Variational
Autoencoders with arbitrary manifolds as a latent space. A Diffusion
Variational Autoencoder uses transition kernels of Brownian motion on the
manifold. In particular, it uses properties of the Brownian motion to implement
the reparametrization trick and fast approximations to the KL divergence. We
show that the Diffusion Variational Autoencoder is capable of capturing
topological properties of synthetic datasets. Additionally, we train MNIST on
spheres, tori, projective spaces, SO(3), and a torus embedded in R3. Although a
natural dataset like MNIST does not have latent variables with a clear-cut
topological structure, training it on a manifold can still highlight
topological and geometrical properties.Comment: 10 pages, 8 figures Added an appendix with derivation of asymptotic
expansion of KL divergence for heat kernel on arbitrary Riemannian manifolds,
and an appendix with new experiments on binarized MNIST. Added a previously
missing factor in the asymptotic expansion of the heat kernel and corrected a
coefficient in asymptotic expansion KL divergence; further minor edit
- …