521 research outputs found
The intersection of two halfspaces has high threshold degree
The threshold degree of a Boolean function f:{0,1}^n->{-1,+1} is the least
degree of a real polynomial p such that f(x)=sgn p(x). We construct two
halfspaces on {0,1}^n whose intersection has threshold degree Theta(sqrt n), an
exponential improvement on previous lower bounds. This solves an open problem
due to Klivans (2002) and rules out the use of perceptron-based techniques for
PAC learning the intersection of two halfspaces, a central unresolved challenge
in computational learning. We also prove that the intersection of two majority
functions has threshold degree Omega(log n), which is tight and settles a
conjecture of O'Donnell and Servedio (2003).
Our proof consists of two parts. First, we show that for any nonconstant
Boolean functions f and g, the intersection f(x)^g(y) has threshold degree O(d)
if and only if ||f-F||_infty + ||g-G||_infty < 1 for some rational functions F,
G of degree O(d). Second, we settle the least degree required for approximating
a halfspace and a majority function to any given accuracy by rational
functions.
Our technique further allows us to make progress on Aaronson's challenge
(2008) and contribute strong direct product theorems for polynomial
representations of composed Boolean functions of the form F(f_1,...,f_n). In
particular, we give an improved lower bound on the approximate degree of the
AND-OR tree.Comment: Full version of the FOCS'09 pape
Moment-Matching Polynomials
We give a new framework for proving the existence of low-degree, polynomial
approximators for Boolean functions with respect to broad classes of
non-product distributions. Our proofs use techniques related to the classical
moment problem and deviate significantly from known Fourier-based methods,
which require the underlying distribution to have some product structure.
Our main application is the first polynomial-time algorithm for agnostically
learning any function of a constant number of halfspaces with respect to any
log-concave distribution (for any constant accuracy parameter). This result was
not known even for the case of learning the intersection of two halfspaces
without noise. Additionally, we show that in the "smoothed-analysis" setting,
the above results hold with respect to distributions that have sub-exponential
tails, a property satisfied by many natural and well-studied distributions in
machine learning.
Given that our algorithms can be implemented using Support Vector Machines
(SVMs) with a polynomial kernel, these results give a rigorous theoretical
explanation as to why many kernel methods work so well in practice
Bounded Independence Fools Degree-2 Threshold Functions
Let x be a random vector coming from any k-wise independent distribution over
{-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is
determined up to an additive epsilon for k = poly(1/epsilon). This answers an
open question of Diakonikolas et al. (FOCS 2009). Using standard constructions
of k-wise independent distributions, we obtain a broad class of explicit
generators that epsilon-fool the class of degree-2 threshold functions with
seed length log(n)*poly(1/epsilon).
Our approach is quite robust: it easily extends to yield that the
intersection of any constant number of degree-2 threshold functions is
epsilon-fooled by poly(1/epsilon)-wise independence. Our results also hold if
the entries of x are k-wise independent standard normals, implying for example
that bounded independence derandomizes the Goemans-Williamson hyperplane
rounding scheme.
To achieve our results, we introduce a technique we dub multivariate
FT-mollification, a generalization of the univariate form introduced by Kane et
al. (SODA 2010) in the context of streaming algorithms. Along the way we prove
a generalized hypercontractive inequality for quadratic forms which takes the
operator norm of the associated matrix into account. These techniques may be of
independent interest.Comment: Using v1 numbering: removed Lemma G.5 from the Appendix (it was
wrong). Net effect is that Theorem G.6 reduces the m^6 dependence of Theorem
8.1 to m^4, not m^
Bounds on the Complexity of Halfspace Intersections when the Bounded Faces have Small Dimension
We study the combinatorial complexity of D-dimensional polyhedra defined as
the intersection of n halfspaces, with the property that the highest dimension
of any bounded face is much smaller than D. We show that, if d is the maximum
dimension of a bounded face, then the number of vertices of the polyhedron is
O(n^d) and the total number of bounded faces of the polyhedron is O(n^d^2). For
inputs in general position the number of bounded faces is O(n^d). For any fixed
d, we show how to compute the set of all vertices, how to determine the maximum
dimension of a bounded face of the polyhedron, and how to compute the set of
bounded faces in polynomial time, by solving a polynomial number of linear
programs
Learning Geometric Concepts with Nasty Noise
We study the efficient learnability of geometric concept classes -
specifically, low-degree polynomial threshold functions (PTFs) and
intersections of halfspaces - when a fraction of the data is adversarially
corrupted. We give the first polynomial-time PAC learning algorithms for these
concept classes with dimension-independent error guarantees in the presence of
nasty noise under the Gaussian distribution. In the nasty noise model, an
omniscient adversary can arbitrarily corrupt a small fraction of both the
unlabeled data points and their labels. This model generalizes well-studied
noise models, including the malicious noise model and the agnostic (adversarial
label noise) model. Prior to our work, the only concept class for which
efficient malicious learning algorithms were known was the class of
origin-centered halfspaces.
Specifically, our robust learning algorithm for low-degree PTFs succeeds
under a number of tame distributions -- including the Gaussian distribution
and, more generally, any log-concave distribution with (approximately) known
low-degree moments. For LTFs under the Gaussian distribution, we give a
polynomial-time algorithm that achieves error , where
is the noise rate. At the core of our PAC learning results is an efficient
algorithm to approximate the low-degree Chow-parameters of any bounded function
in the presence of nasty noise. To achieve this, we employ an iterative
spectral method for outlier detection and removal, inspired by recent work in
robust unsupervised learning. Our aforementioned algorithm succeeds for a range
of distributions satisfying mild concentration bounds and moment assumptions.
The correctness of our robust learning algorithm for intersections of
halfspaces makes essential use of a novel robust inverse independence lemma
that may be of broader interest
- …