5,839 research outputs found

    On Learning Mixtures of Well-Separated Gaussians

    Full text link
    We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of kk standard spherical Gaussians, and the goal is to estimate the means up to accuracy δ\delta using poly(k,d,1/δ)poly(k,d, 1/\delta) samples. In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}1/4\min\{k,d\}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1)o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results. 1. We show that with separation o(logk)o(\sqrt{\log k}), super-polynomially many samples are required. In fact, this holds even when the kk means of the Gaussians are picked at random in d=O(logk)d=O(\log k) dimensions. 2. We show that with separation Ω(logk)\Omega(\sqrt{\log k}), poly(k,d,1/δ)poly(k,d,1/\delta) samples suffice. Note that the bound on the separation is independent of δ\delta. This result is based on a new and efficient "accuracy boosting" algorithm that takes as input coarse estimates of the true means and in time poly(k,d,1/δ)poly(k,d, 1/\delta) outputs estimates of the means up to arbitrary accuracy δ\delta assuming the separation between the means is Ω(min{logk,d})\Omega(\min\{\sqrt{\log k},\sqrt{d}\}) (independently of δ\delta). We also present a computationally efficient algorithm in d=O(1)d=O(1) dimensions with only Ω(d)\Omega(\sqrt{d}) separation. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of kk spherical Gaussians with polynomial samples.Comment: Appeared in FOCS 2017. 55 pages, 1 figur

    Learning Mixtures of Gaussians in High Dimensions

    Full text link
    Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings
    corecore