542 research outputs found

    From average case complexity to improper learning complexity

    Full text link
    The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from NP\mathbf{NP}-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning DNF\mathrm{DNF}'s is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of ω(1)\omega(1) halfspaces is hard.Comment: 34 page

    Moment-Matching Polynomials

    Full text link
    We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice

    Learning Geometric Concepts with Nasty Noise

    Full text link
    We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error O(ϵ)O(\epsilon), where ϵ\epsilon is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest

    The intersection of two halfspaces has high threshold degree

    Full text link
    The threshold degree of a Boolean function f:{0,1}^n->{-1,+1} is the least degree of a real polynomial p such that f(x)=sgn p(x). We construct two halfspaces on {0,1}^n whose intersection has threshold degree Theta(sqrt n), an exponential improvement on previous lower bounds. This solves an open problem due to Klivans (2002) and rules out the use of perceptron-based techniques for PAC learning the intersection of two halfspaces, a central unresolved challenge in computational learning. We also prove that the intersection of two majority functions has threshold degree Omega(log n), which is tight and settles a conjecture of O'Donnell and Servedio (2003). Our proof consists of two parts. First, we show that for any nonconstant Boolean functions f and g, the intersection f(x)^g(y) has threshold degree O(d) if and only if ||f-F||_infty + ||g-G||_infty < 1 for some rational functions F, G of degree O(d). Second, we settle the least degree required for approximating a halfspace and a majority function to any given accuracy by rational functions. Our technique further allows us to make progress on Aaronson's challenge (2008) and contribute strong direct product theorems for polynomial representations of composed Boolean functions of the form F(f_1,...,f_n). In particular, we give an improved lower bound on the approximate degree of the AND-OR tree.Comment: Full version of the FOCS'09 pape
    • …
    corecore