4,237 research outputs found
Robustly Learning Mixtures of Arbitrary Gaussians
We give a polynomial-time algorithm for the problem of robustly estimating a
mixture of arbitrary Gaussians in , for any fixed , in the
presence of a constant fraction of arbitrary corruptions. This resolves the
main open problem in several previous works on algorithmic robust statistics,
which addressed the special cases of robustly estimating (a) a single Gaussian,
(b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of
two Gaussians. Our main tools are an efficient \emph{partial clustering}
algorithm that relies on the sum-of-squares method, and a novel \emph{tensor
decomposition} algorithm that allows errors in both Frobenius norm and low-rank
terms.Comment: This version extends the previous one to yield 1) robust proper
learning algorithm with poly(eps) error and 2) an information theoretic
argument proving that the same algorithms in fact also yield parameter
recovery guarantees. The updates are included in Sections 7,8, and 9 and the
main result from the previous version (Thm 1.4) is presented and proved in
Section
Private Distribution Learning with Public Data: The View from Sample Compression
We study the problem of private distribution learning with access to public
data. In this setup, which we refer to as public-private learning, the learner
is given public and private samples drawn from an unknown distribution
belonging to a class , with the goal of outputting an estimate of
while adhering to privacy constraints (here, pure differential privacy)
only with respect to the private samples.
We show that the public-private learnability of a class is
connected to the existence of a sample compression scheme for , as
well as to an intermediate notion we refer to as list learning. Leveraging this
connection: (1) approximately recovers previous results on Gaussians over
; and (2) leads to new ones, including sample complexity upper
bounds for arbitrary -mixtures of Gaussians over , results for
agnostic and distribution-shift resistant learners, as well as closure
properties for public-private learnability under taking mixtures and products
of distributions. Finally, via the connection to list learning, we show that
for Gaussians in , at least public samples are necessary for
private learnability, which is close to the known upper bound of public
samples.Comment: 31 page
Learning mixtures of separated nonspherical Gaussians
Mixtures of Gaussian (or normal) distributions arise in a variety of
application areas. Many heuristics have been proposed for the task of finding
the component Gaussians given samples from the mixture, such as the EM
algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy.
Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial
time. We present the first algorithm that provably learns the component
Gaussians in time that is polynomial in the dimension. The Gaussians may have
arbitrary shape, but they must satisfy a ``separation condition'' which places
a lower bound on the distance between the centers of any two component
Gaussians. The mathematical results at the heart of our proof are ``distance
concentration'' results--proved using isoperimetric inequalities--which
establish bounds on the probability distribution of the distance between a pair
of points generated according to the mixture. We also formalize the more
general problem of max-likelihood fit of a Gaussian mixture to unstructured
data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
- …