4,226 research outputs found
-MLE: A fast algorithm for learning statistical mixture models
We describe -MLE, a fast and efficient local search algorithm for learning
finite statistical mixtures of exponential families such as Gaussian mixture
models. Mixture models are traditionally learned using the
expectation-maximization (EM) soft clustering technique that monotonically
increases the incomplete (expected complete) likelihood. Given prescribed
mixture weights, the hard clustering -MLE algorithm iteratively assigns data
to the most likely weighted component and update the component models using
Maximum Likelihood Estimators (MLEs). Using the duality between exponential
families and Bregman divergences, we prove that the local convergence of the
complete likelihood of -MLE follows directly from the convergence of a dual
additively weighted Bregman hard clustering. The inner loop of -MLE can be
implemented using any -means heuristic like the celebrated Lloyd's batched
or Hartigan's greedy swap updates. We then show how to update the mixture
weights by minimizing a cross-entropy criterion that implies to update weights
by taking the relative proportion of cluster points, and reiterate the mixture
parameter update and mixture weight update processes until convergence. Hard EM
is interpreted as a special case of -MLE when both the component update and
the weight update are performed successively in the inner loop. To initialize
-MLE, we propose -MLE++, a careful initialization of -MLE guaranteeing
probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201
Cramer-Rao Lower Bound and Information Geometry
This article focuses on an important piece of work of the world renowned
Indian statistician, Calyampudi Radhakrishna Rao. In 1945, C. R. Rao (25 years
old then) published a pathbreaking paper, which had a profound impact on
subsequent statistical research.Comment: To appear in Connected at Infinity II: On the work of Indian
mathematicians (R. Bhatia and C.S. Rajan, Eds.), special volume of Texts and
Readings In Mathematics (TRIM), Hindustan Book Agency, 201
Entropic optimal transport is maximum-likelihood deconvolution
We give a statistical interpretation of entropic optimal transport by showing
that performing maximum-likelihood estimation for Gaussian deconvolution
corresponds to calculating a projection with respect to the entropic optimal
transport distance. This structural result gives theoretical support for the
wide adoption of these tools in the machine learning community
Minimum Rates of Approximate Sufficient Statistics
Given a sufficient statistic for a parametric family of distributions, one
can estimate the parameter without access to the data. However, the memory or
code size for storing the sufficient statistic may nonetheless still be
prohibitive. Indeed, for independent samples drawn from a -nomial
distribution with degrees of freedom, the length of the code scales as
. In many applications, we may not have a useful notion of
sufficient statistics (e.g., when the parametric family is not an exponential
family) and we also may not need to reconstruct the generating distribution
exactly. By adopting a Shannon-theoretic approach in which we allow a small
error in estimating the generating distribution, we construct various {\em
approximate sufficient statistics} and show that the code length can be reduced
to . We consider errors measured according to the
relative entropy and variational distance criteria. For the code constructions,
we leverage Rissanen's minimum description length principle, which yields a
non-vanishing error measured according to the relative entropy. For the
converse parts, we use Clarke and Barron's formula for the relative entropy of
a parametrized distribution and the corresponding mixture distribution.
However, this method only yields a weak converse for the variational distance.
We develop new techniques to achieve vanishing errors and we also prove strong
converses. The latter means that even if the code is allowed to have a
non-vanishing error, its length must still be at least .Comment: To appear in the IEEE Transactions on Information Theor
Uncovering latent structure in valued graphs: A variational approach
As more and more network-structured data sets are available, the statistical
analysis of valued graphs has become common place. Looking for a latent
structure is one of the many strategies used to better understand the behavior
of a network. Several methods already exist for the binary case. We present a
model-based strategy to uncover groups of nodes in valued graphs. This
framework can be used for a wide span of parametric random graphs models and
allows to include covariates. Variational tools allow us to achieve approximate
maximum likelihood estimation of the parameters of these models. We provide a
simulation study showing that our estimation method performs well over a broad
range of situations. We apply this method to analyze host--parasite interaction
networks in forest ecosystems.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS361 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …