5,963 research outputs found
-MLE: A fast algorithm for learning statistical mixture models
We describe -MLE, a fast and efficient local search algorithm for learning
finite statistical mixtures of exponential families such as Gaussian mixture
models. Mixture models are traditionally learned using the
expectation-maximization (EM) soft clustering technique that monotonically
increases the incomplete (expected complete) likelihood. Given prescribed
mixture weights, the hard clustering -MLE algorithm iteratively assigns data
to the most likely weighted component and update the component models using
Maximum Likelihood Estimators (MLEs). Using the duality between exponential
families and Bregman divergences, we prove that the local convergence of the
complete likelihood of -MLE follows directly from the convergence of a dual
additively weighted Bregman hard clustering. The inner loop of -MLE can be
implemented using any -means heuristic like the celebrated Lloyd's batched
or Hartigan's greedy swap updates. We then show how to update the mixture
weights by minimizing a cross-entropy criterion that implies to update weights
by taking the relative proportion of cluster points, and reiterate the mixture
parameter update and mixture weight update processes until convergence. Hard EM
is interpreted as a special case of -MLE when both the component update and
the weight update are performed successively in the inner loop. To initialize
-MLE, we propose -MLE++, a careful initialization of -MLE guaranteeing
probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201
Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means
Bayesian classification labels observations based on given prior information,
namely class-a priori and class-conditional probabilities. Bayes' risk is the
minimum expected classification cost that is achieved by the Bayes' test, the
optimal decision rule. When no cost incurs for correct classification and unit
cost is charged for misclassification, Bayes' test reduces to the maximum a
posteriori decision rule, and Bayes risk simplifies to Bayes' error, the
probability of error. Since calculating this probability of error is often
intractable, several techniques have been devised to bound it with closed-form
formula, introducing thereby measures of similarity and divergence between
distributions like the Bhattacharyya coefficient and its associated
Bhattacharyya distance. The Bhattacharyya upper bound can further be tightened
using the Chernoff information that relies on the notion of best error
exponent. In this paper, we first express Bayes' risk using the total variation
distance on scaled distributions. We then elucidate and extend the
Bhattacharyya and the Chernoff upper bound mechanisms using generalized
weighted means. We provide as a byproduct novel notions of statistical
divergences and affinity coefficients. We illustrate our technique by deriving
new upper bounds for the univariate Cauchy and the multivariate
-distributions, and show experimentally that those bounds are not too
distant to the computationally intractable Bayes' error.Comment: 22 pages, include R code. To appear in Pattern Recognition Letter
On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means
The Jensen-Shannon divergence is a renown bounded symmetrization of the
unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler
divergence to the average mixture distribution. However the Jensen-Shannon
divergence between Gaussian distributions is not available in closed-form. To
bypass this problem, we present a generalization of the Jensen-Shannon (JS)
divergence using abstract means which yields closed-form expressions when the
mean is chosen according to the parametric family of distributions. More
generally, we define the JS-symmetrizations of any distance using generalized
statistical mixtures derived from abstract means. In particular, we first show
that the geometric mean is well-suited for exponential families, and report two
closed-form formula for (i) the geometric Jensen-Shannon divergence between
probability densities of the same exponential family, and (ii) the geometric
JS-symmetrization of the reverse Kullback-Leibler divergence. As a second
illustrating example, we show that the harmonic mean is well-suited for the
scale Cauchy distributions, and report a closed-form formula for the harmonic
Jensen-Shannon divergence between scale Cauchy distributions. We also define
generalized Jensen-Shannon divergences between matrices (e.g., quantum
Jensen-Shannon divergences) and consider clustering with respect to these novel
Jensen-Shannon divergences.Comment: 30 page
Cramer-Rao Lower Bound and Information Geometry
This article focuses on an important piece of work of the world renowned
Indian statistician, Calyampudi Radhakrishna Rao. In 1945, C. R. Rao (25 years
old then) published a pathbreaking paper, which had a profound impact on
subsequent statistical research.Comment: To appear in Connected at Infinity II: On the work of Indian
mathematicians (R. Bhatia and C.S. Rajan, Eds.), special volume of Texts and
Readings In Mathematics (TRIM), Hindustan Book Agency, 201
Derivatives of Multilinear Functions of Matrices
Perturbation or error bounds of functions have been of great interest for a
long time. If the functions are differentiable, then the mean value theorem and
Taylor's theorem come handy for this purpose. While the former is useful in
estimating in terms of and requires the norms of the
first derivative of the function, the latter is useful in computing higher
order perturbation bounds and needs norms of the higher order derivatives of
the function.
In the study of matrices, determinant is an important function. Other scalar
valued functions like eigenvalues and coefficients of characteristic polynomial
are also well studied. Another interesting function of this category is the
permanent, which is an analogue of the determinant in matrix theory. More
generally, there are operator valued functions like tensor powers,
antisymmetric tensor powers and symmetric tensor powers which have gained
importance in the past. In this article, we give a survey of the recent work on
the higher order derivatives of these functions and their norms. Using Taylor's
theorem, higher order perturbation bounds are obtained. Some of these results
are very recent and their detailed proofs will appear elsewhere.Comment: 17 page
- …