Search CORE

1,905 research outputs found

Estimating Mixture Entropy with Pairwise Distances

Author: Kolchinsky Artemy
Tracey Brendan D.
Publication venue: 'MDPI AG'
Publication date: 01/07/2017
Field of study

Mixture distributions arise in many parametric and non-parametric settings -- for example, in Gaussian mixture models and in non-parametric estimation. It is often necessary to compute the entropy of a mixture, but, in most cases, this quantity has no closed-form expression, making some form of approximation necessary. We propose a family of estimators based on a pairwise distance function between mixture components, and show that this estimator class has many attractive properties. For many distributions of interest, the proposed estimators are efficient to compute, differentiable in the mixture parameters, and become exact when the mixture components are clustered. We prove this family includes lower and upper bounds on the mixture entropy. The Chernoff

\alpha

-divergence gives a lower bound when chosen as the distance function, with the Bhattacharyya distance providing the tightest lower bound for components that are symmetric and members of a location family. The Kullback-Leibler divergence gives an upper bound when used as the distance function. We provide closed-form expressions of these bounds for mixtures of Gaussians, and discuss their applications to the estimation of mutual information. We then demonstrate that our bounds are significantly tighter than well-known existing bounds using numeric simulations. This estimator class is very useful in optimization problems involving maximization/minimization of entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in Section V (bounds on mutual information

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Sample-Efficient Learning of Mixtures

Author: Ashtiani Hassan
Ben-David Shai
Mehrabian Abbas
Publication venue
Publication date: 29/04/2018
Field of study

We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let

\mathcal F

be an arbitrary class of probability distributions, and let

\mathcal{F}^k

denote the class of

k

-mixtures of elements of

\mathcal F

. Assuming the existence of a method for learning

\mathcal F

with sample complexity

m_{\mathcal{F}}(\epsilon)

, we provide a method for learning

\mathcal F^k

with sample complexity

O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}})

. Our mixture learning algorithm has the property that, if the

\mathcal F

-learner is proper/agnostic, then the

\mathcal F^k

-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of

k

axis-aligned Gaussians in

\mathbb{R}^d

is PAC-learnable in the agnostic setting with

\widetilde{O}({kd}/{\epsilon ^ 4})

samples, which is tight in

k

and

d

up to logarithmic factors. Second, we show that the class of mixtures of

k

Gaussians in

\mathbb{R}^d

is PAC-learnable in the agnostic setting with sample complexity

\widetilde{O}({kd^2}/{\epsilon ^ 4})

, which improves the previous known bounds of

\widetilde{O}({k^3d^2}/{\epsilon ^ 4})

and

\widetilde{O}(k^4d^4/\epsilon ^ 2)

in its dependence on

k

and

d

. Finally, we show that the class of mixtures of

k

log-concave distributions over

\mathbb{R}^d

is PAC-learnable using

\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k)

samples.Comment: A bug from the previous version, which appeared in AAAI 2018 proceedings, is fixed. 18 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

On Learning Mixtures of Well-Separated Gaussians

Author: Regev Oded
Vijayaraghavan Aravindan
Publication venue
Publication date: 31/10/2017
Field of study

We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of

k

standard spherical Gaussians, and the goal is to estimate the means up to accuracy

\delta

using

poly(k,d, 1/\delta)

samples. In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly

\min\{k,d\}^{1/4}

. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation

o(1)

, exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results. 1. We show that with separation

o(\sqrt{\log k})

, super-polynomially many samples are required. In fact, this holds even when the

k

means of the Gaussians are picked at random in

d=O(\log k)

dimensions. 2. We show that with separation

\Omega(\sqrt{\log k})

poly(k,d,1/\delta)

samples suffice. Note that the bound on the separation is independent of

\delta

. This result is based on a new and efficient "accuracy boosting" algorithm that takes as input coarse estimates of the true means and in time

poly(k,d, 1/\delta)

outputs estimates of the means up to arbitrary accuracy

\delta

assuming the separation between the means is

\Omega(\min\{\sqrt{\log k},\sqrt{d}\})

(independently of

\delta

). We also present a computationally efficient algorithm in

d=O(1)

dimensions with only

\Omega(\sqrt{d})

separation. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of

k

spherical Gaussians with polynomial samples.Comment: Appeared in FOCS 2017. 55 pages, 1 figur

arXiv.org e-Print Archive

Crossref

List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians

Author: Diakonikolas Ilias
Kane Daniel M.
Stewart Alistair
Publication venue
Publication date: 20/11/2017
Field of study

We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. {\bf List-Decodable Mean Estimation.} Fix any

d \in \mathbb{Z}_+

and

0< \alpha <1/2

. We design an algorithm with runtime

O (\mathrm{poly}(n/\alpha)^{d})

that outputs a list of

O(1/\alpha)

many candidate vectors such that with high probability one of the candidates is within

\ell_2

-distance

O(\alpha^{-1/(2d)})

from the true mean. The only previous algorithm for this problem achieved error

\tilde O(\alpha^{-1/2})

under second moment conditions. For

d = O(1/\epsilon)

, our algorithm runs in polynomial time and achieves error

O(\alpha^{\epsilon})

. We also give a Statistical Query lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. {\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm for mixtures of spherical Gaussians that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform mixture of

k

identity covariance Gaussians we obtain: For any

\epsilon>0

, if the pairwise separation between the means is at least

\Omega(k^{\epsilon}+\sqrt{\log(1/\delta)})

, our algorithm learns the unknown parameters within accuracy

\delta

with sample complexity and running time

\mathrm{poly} (n, 1/\delta, (k/\epsilon)^{1/\epsilon})

. The previously best known polynomial time algorithm required separation at least

k^{1/4} \mathrm{polylog}(k/\delta)

. Our main technical contribution is a new technique, using degree-

d

multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians

Author: Dasgupta Sanjoy
Schulman Leonard J.
Publication venue: Journal of Machine Learning Research
Publication date: 01/02/2007
Field of study

We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ^d, a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to near-optimal precision, if the dimension is high (d >> ln k). We relate this to previous theoretical and empirical work on the EM algorithm

Caltech Authors

A Tight Convex Upper Bound on the Likelihood of a Finite Mixture

Author: Mezuman Elad
Weiss Yair
Publication venue
Publication date: 18/08/2016
Field of study

The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this paper we ask: is it possible to assess how far we are from the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. For any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihoodComment: icpr 201

arXiv.org e-Print Archive

Crossref