6,204 research outputs found

    kk-MLE: A fast algorithm for learning statistical mixture models

    Full text link
    We describe kk-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering kk-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of kk-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of kk-MLE can be implemented using any kk-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of kk-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize kk-MLE, we propose kk-MLE++, a careful initialization of kk-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201

    On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means

    Full text link
    The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution. However the Jensen-Shannon divergence between Gaussian distributions is not available in closed-form. To bypass this problem, we present a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using generalized statistical mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen-Shannon divergence between probability densities of the same exponential family, and (ii) the geometric JS-symmetrization of the reverse Kullback-Leibler divergence. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen-Shannon divergence between scale Cauchy distributions. We also define generalized Jensen-Shannon divergences between matrices (e.g., quantum Jensen-Shannon divergences) and consider clustering with respect to these novel Jensen-Shannon divergences.Comment: 30 page

    Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means

    Full text link
    Bayesian classification labels observations based on given prior information, namely class-a priori and class-conditional probabilities. Bayes' risk is the minimum expected classification cost that is achieved by the Bayes' test, the optimal decision rule. When no cost incurs for correct classification and unit cost is charged for misclassification, Bayes' test reduces to the maximum a posteriori decision rule, and Bayes risk simplifies to Bayes' error, the probability of error. Since calculating this probability of error is often intractable, several techniques have been devised to bound it with closed-form formula, introducing thereby measures of similarity and divergence between distributions like the Bhattacharyya coefficient and its associated Bhattacharyya distance. The Bhattacharyya upper bound can further be tightened using the Chernoff information that relies on the notion of best error exponent. In this paper, we first express Bayes' risk using the total variation distance on scaled distributions. We then elucidate and extend the Bhattacharyya and the Chernoff upper bound mechanisms using generalized weighted means. We provide as a byproduct novel notions of statistical divergences and affinity coefficients. We illustrate our technique by deriving new upper bounds for the univariate Cauchy and the multivariate tt-distributions, and show experimentally that those bounds are not too distant to the computationally intractable Bayes' error.Comment: 22 pages, include R code. To appear in Pattern Recognition Letter

    Cramer-Rao Lower Bound and Information Geometry

    Full text link
    This article focuses on an important piece of work of the world renowned Indian statistician, Calyampudi Radhakrishna Rao. In 1945, C. R. Rao (25 years old then) published a pathbreaking paper, which had a profound impact on subsequent statistical research.Comment: To appear in Connected at Infinity II: On the work of Indian mathematicians (R. Bhatia and C.S. Rajan, Eds.), special volume of Texts and Readings In Mathematics (TRIM), Hindustan Book Agency, 201

    Derivatives of Multilinear Functions of Matrices

    Full text link
    Perturbation or error bounds of functions have been of great interest for a long time. If the functions are differentiable, then the mean value theorem and Taylor's theorem come handy for this purpose. While the former is useful in estimating ∥f(A+X)−f(A)∥\|f(A+X)-f(A)\| in terms of ∥X∥\|X\| and requires the norms of the first derivative of the function, the latter is useful in computing higher order perturbation bounds and needs norms of the higher order derivatives of the function. In the study of matrices, determinant is an important function. Other scalar valued functions like eigenvalues and coefficients of characteristic polynomial are also well studied. Another interesting function of this category is the permanent, which is an analogue of the determinant in matrix theory. More generally, there are operator valued functions like tensor powers, antisymmetric tensor powers and symmetric tensor powers which have gained importance in the past. In this article, we give a survey of the recent work on the higher order derivatives of these functions and their norms. Using Taylor's theorem, higher order perturbation bounds are obtained. Some of these results are very recent and their detailed proofs will appear elsewhere.Comment: 17 page
    • …
    corecore