1,905 research outputs found
Estimating Mixture Entropy with Pairwise Distances
Mixture distributions arise in many parametric and non-parametric settings --
for example, in Gaussian mixture models and in non-parametric estimation. It is
often necessary to compute the entropy of a mixture, but, in most cases, this
quantity has no closed-form expression, making some form of approximation
necessary. We propose a family of estimators based on a pairwise distance
function between mixture components, and show that this estimator class has
many attractive properties. For many distributions of interest, the proposed
estimators are efficient to compute, differentiable in the mixture parameters,
and become exact when the mixture components are clustered. We prove this
family includes lower and upper bounds on the mixture entropy. The Chernoff
-divergence gives a lower bound when chosen as the distance function,
with the Bhattacharyya distance providing the tightest lower bound for
components that are symmetric and members of a location family. The
Kullback-Leibler divergence gives an upper bound when used as the distance
function. We provide closed-form expressions of these bounds for mixtures of
Gaussians, and discuss their applications to the estimation of mutual
information. We then demonstrate that our bounds are significantly tighter than
well-known existing bounds using numeric simulations. This estimator class is
very useful in optimization problems involving maximization/minimization of
entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in
Section V (bounds on mutual information
Sample-Efficient Learning of Mixtures
We consider PAC learning of probability distributions (a.k.a. density
estimation), where we are given an i.i.d. sample generated from an unknown
target distribution, and want to output a distribution that is close to the
target in total variation distance. Let be an arbitrary class of
probability distributions, and let denote the class of
-mixtures of elements of . Assuming the existence of a method
for learning with sample complexity ,
we provide a method for learning with sample complexity
. Our mixture
learning algorithm has the property that, if the -learner is
proper/agnostic, then the -learner would be proper/agnostic as
well.
This general result enables us to improve the best known sample complexity
upper bounds for a variety of important mixture classes. First, we show that
the class of mixtures of axis-aligned Gaussians in is
PAC-learnable in the agnostic setting with
samples, which is tight in and up to logarithmic factors. Second, we
show that the class of mixtures of Gaussians in is
PAC-learnable in the agnostic setting with sample complexity
, which improves the previous known
bounds of and
in its dependence on and . Finally,
we show that the class of mixtures of log-concave distributions over
is PAC-learnable using
samples.Comment: A bug from the previous version, which appeared in AAAI 2018
proceedings, is fixed. 18 page
On Learning Mixtures of Well-Separated Gaussians
We consider the problem of efficiently learning mixtures of a large number of
spherical Gaussians, when the components of the mixture are well separated. In
the most basic form of this problem, we are given samples from a uniform
mixture of standard spherical Gaussians, and the goal is to estimate the
means up to accuracy using samples.
In this work, we study the following question: what is the minimum separation
needed between the means for solving this task? The best known algorithm due to
Vempala and Wang [JCSS 2004] requires a separation of roughly
. On the other hand, Moitra and Valiant [FOCS 2010] showed
that with separation , exponentially many samples are required. We
address the significant gap between these two bounds, by showing the following
results.
1. We show that with separation , super-polynomially many
samples are required. In fact, this holds even when the means of the
Gaussians are picked at random in dimensions.
2. We show that with separation ,
samples suffice. Note that the bound on the separation is independent of
. This result is based on a new and efficient "accuracy boosting"
algorithm that takes as input coarse estimates of the true means and in time
outputs estimates of the means up to arbitrary accuracy
assuming the separation between the means is (independently of ).
We also present a computationally efficient algorithm in dimensions
with only separation. These results together essentially
characterize the optimal order of separation between components that is needed
to learn a mixture of spherical Gaussians with polynomial samples.Comment: Appeared in FOCS 2017. 55 pages, 1 figur
List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians
We study the problem of list-decodable Gaussian mean estimation and the
related problem of learning mixtures of separated spherical Gaussians. We
develop a set of techniques that yield new efficient algorithms with
significantly improved guarantees for these problems.
{\bf List-Decodable Mean Estimation.} Fix any and . We design an algorithm with runtime that outputs a list of many
candidate vectors such that with high probability one of the candidates is
within -distance from the true mean. The only
previous algorithm for this problem achieved error
under second moment conditions. For , our algorithm runs in
polynomial time and achieves error . We also give a
Statistical Query lower bound suggesting that the complexity of our algorithm
is qualitatively close to best possible.
{\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm
for mixtures of spherical Gaussians that succeeds under significantly weaker
separation assumptions compared to prior work. For the prototypical case of a
uniform mixture of identity covariance Gaussians we obtain: For any
, if the pairwise separation between the means is at least
, our algorithm learns the unknown
parameters within accuracy with sample complexity and running time
. The previously best
known polynomial time algorithm required separation at least .
Our main technical contribution is a new technique, using degree-
multivariate polynomials, to remove outliers from high-dimensional datasets
where the majority of the points are corrupted
A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians
We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ^d, a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to near-optimal precision, if the dimension is high (d >> ln k). We relate this to previous theoretical and empirical work on the EM algorithm
A Tight Convex Upper Bound on the Likelihood of a Finite Mixture
The likelihood function of a finite mixture model is a non-convex function
with multiple local maxima and commonly used iterative algorithms such as EM
will converge to different solutions depending on initial conditions. In this
paper we ask: is it possible to assess how far we are from the global maximum
of the likelihood? Since the likelihood of a finite mixture model can grow
unboundedly by centering a Gaussian on a single datapoint and shrinking the
covariance, we constrain the problem by assuming that the parameters of the
individual models are members of a large discrete set (e.g. estimating a
mixture of two Gaussians where the means and variances of both Gaussians are
members of a set of a million possible means and variances). For this setting
we show that a simple upper bound on the likelihood can be computed using
convex optimization and we analyze conditions under which the bound is
guaranteed to be tight. This bound can then be used to assess the quality of
solutions found by EM (where the final result is projected on the discrete set)
or any other mixture estimation algorithm. For any dataset our method allows us
to find a finite mixture model together with a dataset-specific bound on how
far the likelihood of this mixture is from the global optimum of the likelihoodComment: icpr 201
- …