94,876 research outputs found
Inference by Minimizing Size, Divergence, or their Sum
We speed up marginal inference by ignoring factors that do not significantly
contribute to overall accuracy. In order to pick a suitable subset of factors
to ignore, we propose three schemes: minimizing the number of model factors
under a bound on the KL divergence between pruned and full models; minimizing
the KL divergence under a bound on factor count; and minimizing the weighted
sum of KL divergence and factor count. All three problems are solved using an
approximation of the KL divergence than can be calculated in terms of marginals
computed on a simple seed graph. Applied to synthetic image denoising and to
three different types of NLP parsing models, this technique performs marginal
inference up to 11 times faster than loopy BP, with graph sizes reduced up to
98%-at comparable error in marginals and parsing accuracy. We also show that
minimizing the weighted sum of divergence and size is substantially faster than
minimizing either of the other objectives based on the approximation to
divergence presented here.Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty
in Artificial Intelligence (UAI2010
Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets
Nonnegative Matrix Factorization (NMF) with Kullback-Leibler Divergence
(NMF-KL) is one of the most significant NMF problems and equivalent to
Probabilistic Latent Semantic Indexing (PLSI), which has been successfully
applied in many applications. For sparse count data, a Poisson distribution and
KL divergence provide sparse models and sparse representation, which describe
the random variation better than a normal distribution and Frobenius norm.
Specially, sparse models provide more concise understanding of the appearance
of attributes over latent components, while sparse representation provides
concise interpretability of the contribution of latent components over
instances. However, minimizing NMF with KL divergence is much more difficult
than minimizing NMF with Frobenius norm; and sparse models, sparse
representation and fast algorithms for large sparse datasets are still
challenges for NMF with KL divergence. In this paper, we propose a fast
parallel randomized coordinate descent algorithm having fast convergence for
large sparse datasets to archive sparse models and sparse representation. The
proposed algorithm's experimental results overperform the current studies' ones
in this problem
On the Universality of the Logistic Loss Function
A loss function measures the discrepancy between the true values
(observations) and their estimated fits, for a given instance of data. A loss
function is said to be proper (unbiased, Fisher consistent) if the fits are
defined over a unit simplex, and the minimizer of the expected loss is the true
underlying probability of the data. Typical examples are the zero-one loss, the
quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work
we show that for binary classification problems, the divergence associated with
smooth, proper and convex loss functions is bounded from above by the
Kullback-Leibler (KL) divergence, up to a multiplicative normalization
constant. It implies that by minimizing the log-loss (associated with the KL
divergence), we minimize an upper bound to any choice of loss functions from
this set. This property justifies the broad use of log-loss in regression,
decision trees, deep neural networks and many other applications. In addition,
we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a multiplicative
normalization constant). This result introduces a new set of divergence
inequalities, similar to the well-known Pinsker inequality
- …