Search CORE

4,237 research outputs found

Robustly Learning Mixtures of $k$ Arbitrary Gaussians

Author: Bakshi Ainesh
Diakonikolas Ilias
Jia He
Kane Daniel M.
Kothari Pravesh K.
Vempala Santosh S.
Publication venue
Publication date: 07/06/2021
Field of study

We give a polynomial-time algorithm for the problem of robustly estimating a mixture of

k

arbitrary Gaussians in

\mathbb{R}^d

, for any fixed

k

, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. Our main tools are an efficient \emph{partial clustering} algorithm that relies on the sum-of-squares method, and a novel \emph{tensor decomposition} algorithm that allows errors in both Frobenius norm and low-rank terms.Comment: This version extends the previous one to yield 1) robust proper learning algorithm with poly(eps) error and 2) an information theoretic argument proving that the same algorithms in fact also yield parameter recovery guarantees. The updates are included in Sections 7,8, and 9 and the main result from the previous version (Thm 1.4) is presented and proved in Section

arXiv.org e-Print Archive

eScholarship - University of California

Private Distribution Learning with Public Data: The View from Sample Compression

Author: Ben-David Shai
Bie Alex
Canonne Clément L.
Kamath Gautam
Singhal Vikrant
Publication venue
Publication date: 14/08/2023
Field of study

We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution

p

belonging to a class

\mathcal Q

, with the goal of outputting an estimate of

p

while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class

\mathcal Q

is connected to the existence of a sample compression scheme for

\mathcal Q

, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over

\mathbb R^d

; and (2) leads to new ones, including sample complexity upper bounds for arbitrary

k

-mixtures of Gaussians over

\mathbb R^d

, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in

\mathbb R^d

, at least

d

public samples are necessary for private learnability, which is close to the known upper bound of

d+1

public samples.Comment: 31 page

arXiv.org e-Print Archive

Learning mixtures of separated nonspherical Gaussians

Author: Arora Sanjeev
Kannan Ravi
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a ``separation condition'' which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are ``distance concentration'' results--proved using isoperimetric inequalities--which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref