2,191 research outputs found
List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians
We study the problem of list-decodable Gaussian mean estimation and the
related problem of learning mixtures of separated spherical Gaussians. We
develop a set of techniques that yield new efficient algorithms with
significantly improved guarantees for these problems.
{\bf List-Decodable Mean Estimation.} Fix any and . We design an algorithm with runtime that outputs a list of many
candidate vectors such that with high probability one of the candidates is
within -distance from the true mean. The only
previous algorithm for this problem achieved error
under second moment conditions. For , our algorithm runs in
polynomial time and achieves error . We also give a
Statistical Query lower bound suggesting that the complexity of our algorithm
is qualitatively close to best possible.
{\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm
for mixtures of spherical Gaussians that succeeds under significantly weaker
separation assumptions compared to prior work. For the prototypical case of a
uniform mixture of identity covariance Gaussians we obtain: For any
, if the pairwise separation between the means is at least
, our algorithm learns the unknown
parameters within accuracy with sample complexity and running time
. The previously best
known polynomial time algorithm required separation at least .
Our main technical contribution is a new technique, using degree-
multivariate polynomials, to remove outliers from high-dimensional datasets
where the majority of the points are corrupted
Bounded Independence Fools Degree-2 Threshold Functions
Let x be a random vector coming from any k-wise independent distribution over
{-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is
determined up to an additive epsilon for k = poly(1/epsilon). This answers an
open question of Diakonikolas et al. (FOCS 2009). Using standard constructions
of k-wise independent distributions, we obtain a broad class of explicit
generators that epsilon-fool the class of degree-2 threshold functions with
seed length log(n)*poly(1/epsilon).
Our approach is quite robust: it easily extends to yield that the
intersection of any constant number of degree-2 threshold functions is
epsilon-fooled by poly(1/epsilon)-wise independence. Our results also hold if
the entries of x are k-wise independent standard normals, implying for example
that bounded independence derandomizes the Goemans-Williamson hyperplane
rounding scheme.
To achieve our results, we introduce a technique we dub multivariate
FT-mollification, a generalization of the univariate form introduced by Kane et
al. (SODA 2010) in the context of streaming algorithms. Along the way we prove
a generalized hypercontractive inequality for quadratic forms which takes the
operator norm of the associated matrix into account. These techniques may be of
independent interest.Comment: Using v1 numbering: removed Lemma G.5 from the Appendix (it was
wrong). Net effect is that Theorem G.6 reduces the m^6 dependence of Theorem
8.1 to m^4, not m^
What determines self-employment? : a comparative study
This article consists of a comparative study of the incidence of self-employment (SE) between Greece, which has the highest rate of SE in the European Union and the United Kingdom, which has amongst the lowest. Data from the Greek and the UK Labour Force Surveys are used in order to assess how personal attributes of an individual have an impact on the incidence of SE. It is found that common patterns exist between these two countries. In particular, it is found that for both countries, males have greater odds of being self-employed than females, older people have greater odds than younger, individuals employed in the primary and tertiary sectors have greater odds than the ones employed in the secondary, and that individuals with primary or secondary education have greater odds of being self-employed than individuals holding higher degrees. The incidence of SE is also found to differ according to the occupation of the individual. On the other hand, the findings indicate that individuals, residing in London, have greater odds of being self-employed than individuals working outside UK's capital, whereas in Greece the pattern is reversed
Sharp Bounds for Generalized Uniformity Testing
We study the problem of generalized uniformity testing \cite{BC17} of a
discrete probability distribution: Given samples from a probability
distribution over an {\em unknown} discrete domain , we
want to distinguish, with probability at least , between the case that
is uniform on some {\em subset} of versus -far, in
total variation distance, from any such uniform distribution.
We establish tight bounds on the sample complexity of generalized uniformity
testing. In more detail, we present a computationally efficient tester whose
sample complexity is optimal, up to constant factors, and a matching
information-theoretic lower bound. Specifically, we show that the sample
complexity of generalized uniformity testing is
- …