521 research outputs found

    The intersection of two halfspaces has high threshold degree

    Full text link
    The threshold degree of a Boolean function f:{0,1}^n->{-1,+1} is the least degree of a real polynomial p such that f(x)=sgn p(x). We construct two halfspaces on {0,1}^n whose intersection has threshold degree Theta(sqrt n), an exponential improvement on previous lower bounds. This solves an open problem due to Klivans (2002) and rules out the use of perceptron-based techniques for PAC learning the intersection of two halfspaces, a central unresolved challenge in computational learning. We also prove that the intersection of two majority functions has threshold degree Omega(log n), which is tight and settles a conjecture of O'Donnell and Servedio (2003). Our proof consists of two parts. First, we show that for any nonconstant Boolean functions f and g, the intersection f(x)^g(y) has threshold degree O(d) if and only if ||f-F||_infty + ||g-G||_infty < 1 for some rational functions F, G of degree O(d). Second, we settle the least degree required for approximating a halfspace and a majority function to any given accuracy by rational functions. Our technique further allows us to make progress on Aaronson's challenge (2008) and contribute strong direct product theorems for polynomial representations of composed Boolean functions of the form F(f_1,...,f_n). In particular, we give an improved lower bound on the approximate degree of the AND-OR tree.Comment: Full version of the FOCS'09 pape

    Moment-Matching Polynomials

    Full text link
    We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice

    Bounded Independence Fools Degree-2 Threshold Functions

    Full text link
    Let x be a random vector coming from any k-wise independent distribution over {-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is determined up to an additive epsilon for k = poly(1/epsilon). This answers an open question of Diakonikolas et al. (FOCS 2009). Using standard constructions of k-wise independent distributions, we obtain a broad class of explicit generators that epsilon-fool the class of degree-2 threshold functions with seed length log(n)*poly(1/epsilon). Our approach is quite robust: it easily extends to yield that the intersection of any constant number of degree-2 threshold functions is epsilon-fooled by poly(1/epsilon)-wise independence. Our results also hold if the entries of x are k-wise independent standard normals, implying for example that bounded independence derandomizes the Goemans-Williamson hyperplane rounding scheme. To achieve our results, we introduce a technique we dub multivariate FT-mollification, a generalization of the univariate form introduced by Kane et al. (SODA 2010) in the context of streaming algorithms. Along the way we prove a generalized hypercontractive inequality for quadratic forms which takes the operator norm of the associated matrix into account. These techniques may be of independent interest.Comment: Using v1 numbering: removed Lemma G.5 from the Appendix (it was wrong). Net effect is that Theorem G.6 reduces the m^6 dependence of Theorem 8.1 to m^4, not m^

    Bounds on the Complexity of Halfspace Intersections when the Bounded Faces have Small Dimension

    Full text link
    We study the combinatorial complexity of D-dimensional polyhedra defined as the intersection of n halfspaces, with the property that the highest dimension of any bounded face is much smaller than D. We show that, if d is the maximum dimension of a bounded face, then the number of vertices of the polyhedron is O(n^d) and the total number of bounded faces of the polyhedron is O(n^d^2). For inputs in general position the number of bounded faces is O(n^d). For any fixed d, we show how to compute the set of all vertices, how to determine the maximum dimension of a bounded face of the polyhedron, and how to compute the set of bounded faces in polynomial time, by solving a polynomial number of linear programs

    Learning Geometric Concepts with Nasty Noise

    Full text link
    We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error O(ϵ)O(\epsilon), where ϵ\epsilon is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest
    • …
    corecore