18 research outputs found
Robustly Learning Mixtures of Arbitrary Gaussians
We give a polynomial-time algorithm for the problem of robustly estimating a
mixture of arbitrary Gaussians in , for any fixed , in the
presence of a constant fraction of arbitrary corruptions. This resolves the
main open problem in several previous works on algorithmic robust statistics,
which addressed the special cases of robustly estimating (a) a single Gaussian,
(b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of
two Gaussians. Our main tools are an efficient \emph{partial clustering}
algorithm that relies on the sum-of-squares method, and a novel \emph{tensor
decomposition} algorithm that allows errors in both Frobenius norm and low-rank
terms.Comment: This version extends the previous one to yield 1) robust proper
learning algorithm with poly(eps) error and 2) an information theoretic
argument proving that the same algorithms in fact also yield parameter
recovery guarantees. The updates are included in Sections 7,8, and 9 and the
main result from the previous version (Thm 1.4) is presented and proved in
Section
Settling the Robust Learnability of Mixtures of Gaussians
This work represents a natural coalescence of two important lines of work:
learning mixtures of Gaussians and algorithmic robust statistics. In particular
we give the first provably robust algorithm for learning mixtures of any
constant number of Gaussians. We require only mild assumptions on the mixing
weights (bounded fractionality) and that the total variation distance between
components is bounded away from zero. At the heart of our algorithm is a new
method for proving dimension-independent polynomial identifiability through
applying a carefully chosen sequence of differential operations to certain
generating functions that not only encode the parameters we would like to learn
but also the system of polynomial equations we would like to solve. We show how
the symbolic identities we derive can be directly used to analyze a natural
sum-of-squares relaxation
Estimating Gaussian mixtures using sparse polynomial moment systems
The method of moments is a statistical technique for density estimation that
solves a system of moment equations to estimate the parameters of an unknown
distribution. A fundamental question critical to understanding identifiability
asks how many moment equations are needed to get finitely many solutions and
how many solutions there are. We answer this question for classes of Gaussian
mixture models using the tools of polyhedral geometry. Using these results, we
present an algorithm that performs parameter recovery, and therefore density
estimation, for high dimensional Gaussian mixture models that scales linearly
in the dimension.Comment: 30 page