4,237 research outputs found

    Robustly Learning Mixtures of kk Arbitrary Gaussians

    Full text link
    We give a polynomial-time algorithm for the problem of robustly estimating a mixture of kk arbitrary Gaussians in Rd\mathbb{R}^d, for any fixed kk, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. Our main tools are an efficient \emph{partial clustering} algorithm that relies on the sum-of-squares method, and a novel \emph{tensor decomposition} algorithm that allows errors in both Frobenius norm and low-rank terms.Comment: This version extends the previous one to yield 1) robust proper learning algorithm with poly(eps) error and 2) an information theoretic argument proving that the same algorithms in fact also yield parameter recovery guarantees. The updates are included in Sections 7,8, and 9 and the main result from the previous version (Thm 1.4) is presented and proved in Section

    Private Distribution Learning with Public Data: The View from Sample Compression

    Full text link
    We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution pp belonging to a class Q\mathcal Q, with the goal of outputting an estimate of pp while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class Q\mathcal Q is connected to the existence of a sample compression scheme for Q\mathcal Q, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over Rd\mathbb R^d; and (2) leads to new ones, including sample complexity upper bounds for arbitrary kk-mixtures of Gaussians over Rd\mathbb R^d, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in Rd\mathbb R^d, at least dd public samples are necessary for private learnability, which is close to the known upper bound of d+1d+1 public samples.Comment: 31 page

    Learning mixtures of separated nonspherical Gaussians

    Full text link
    Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a ``separation condition'' which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are ``distance concentration'' results--proved using isoperimetric inequalities--which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore