2,187,856 research outputs found
Quantized Compressive K-Means
The recent framework of compressive statistical learning aims at designing
tractable learning algorithms that use only a heavily compressed
representation-or sketch-of massive datasets. Compressive K-Means (CKM) is such
a method: it estimates the centroids of data clusters from pooled, non-linear,
random signatures of the learning examples. While this approach significantly
reduces computational time on very large datasets, its digital implementation
wastes acquisition resources because the learning examples are compressed only
after the sensing stage. The present work generalizes the sketching procedure
initially defined in Compressive K-Means to a large class of periodic
nonlinearities including hardware-friendly implementations that compressively
acquire entire datasets. This idea is exemplified in a Quantized Compressive
K-Means procedure, a variant of CKM that leverages 1-bit universal quantization
(i.e. retaining the least significant bit of a standard uniform quantizer) as
the periodic sketch nonlinearity. Trading for this resource-efficient signature
(standard in most acquisition schemes) has almost no impact on the clustering
performances, as illustrated by numerical experiments
On Learning Mixtures of Well-Separated Gaussians
We consider the problem of efficiently learning mixtures of a large number of
spherical Gaussians, when the components of the mixture are well separated. In
the most basic form of this problem, we are given samples from a uniform
mixture of standard spherical Gaussians, and the goal is to estimate the
means up to accuracy using samples.
In this work, we study the following question: what is the minimum separation
needed between the means for solving this task? The best known algorithm due to
Vempala and Wang [JCSS 2004] requires a separation of roughly
. On the other hand, Moitra and Valiant [FOCS 2010] showed
that with separation , exponentially many samples are required. We
address the significant gap between these two bounds, by showing the following
results.
1. We show that with separation , super-polynomially many
samples are required. In fact, this holds even when the means of the
Gaussians are picked at random in dimensions.
2. We show that with separation ,
samples suffice. Note that the bound on the separation is independent of
. This result is based on a new and efficient "accuracy boosting"
algorithm that takes as input coarse estimates of the true means and in time
outputs estimates of the means up to arbitrary accuracy
assuming the separation between the means is (independently of ).
We also present a computationally efficient algorithm in dimensions
with only separation. These results together essentially
characterize the optimal order of separation between components that is needed
to learn a mixture of spherical Gaussians with polynomial samples.Comment: Appeared in FOCS 2017. 55 pages, 1 figur
- …