Search CORE

159 research outputs found

Moment-Matching Polynomials

Author: Klivans Adam
Meka Raghu
Publication venue
Publication date: 04/01/2013
Field of study

We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice

arXiv.org e-Print Archive

CiteSeerX

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

Author: Bun Mark
Steinke Thomas
Publication venue
Publication date: 08/12/2014
Field of study

Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on

\mathbb{R}^n

in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to

\exp(-|x|^{0.99})

. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only

O(\log(1/\delta))

-wise independence, for a tail probability of

\delta

. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

The Average Sensitivity of an Intersection of Half Spaces

Author: Kane Daniel M.
Publication venue
Publication date: 26/01/2015
Field of study

We prove new bounds on the average sensitivity of the indicator function of an intersection of

k

halfspaces. In particular, we prove the optimal bound of

O(\sqrt{n\log(k)})

. This generalizes a result of Nazarov, who proved the analogous result in the Gaussian case, and improves upon a result of Harsha, Klivans and Meka. Furthermore, our result has implications for the runtime required to learn intersections of halfspaces

arXiv.org e-Print Archive

CiteSeerX

Crossref

From average case complexity to improper learning complexity

Author: Berthet Q.
Daniely A.
Feige U.
Vapnik V. N.
Publication venue
Publication date: 01/01/2014
Field of study

The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from

\mathbf{NP}

-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning

\mathrm{DNF}

's is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of

\omega(1)

halfspaces is hard.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Learning Geometric Concepts with Nasty Noise

Author: Daniely A.
Diakonikolas I.
High Robust Estimators
Learning
Publication venue
Publication date: 05/07/2017
Field of study

We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error

O(\epsilon)

, where

\epsilon

is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest

arXiv.org e-Print Archive

Crossref

MCMC Learning

Author: Kanade Varun
Mossel Elchanan
Publication venue
Publication date: 12/06/2015
Field of study

The theory of learning under the uniform distribution is rich and deep, with connections to cryptography, computational complexity, and the analysis of boolean functions to name a few areas. This theory however is very limited due to the fact that the uniform distribution and the corresponding Fourier basis are rarely encountered as a statistical model. A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in many areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

The Power of Localization for Efficiently Learning Linear Separators with Noise

Author: Awasthi Pranjal
Balcan Maria Florina
Long Philip M.
Publication venue
Publication date: 03/06/2018
Field of study

We introduce a new approach for designing computationally efficient learning algorithms that are tolerant to noise, and demonstrate its effectiveness by designing algorithms with improved noise tolerance guarantees for learning linear separators. We consider both the malicious noise model and the adversarial label noise model. For malicious noise, where the adversary can corrupt both the label and the features, we provide a polynomial-time algorithm for learning linear separators in

\Re^d

under isotropic log-concave distributions that can tolerate a nearly information-theoretically optimal noise rate of

\eta = \Omega(\epsilon)

. For the adversarial label noise model, where the distribution over the feature vectors is unchanged, and the overall probability of a noisy label is constrained to be at most

\eta

, we also give a polynomial-time algorithm for learning linear separators in

\Re^d

under isotropic log-concave distributions that can handle a noise rate of

\eta = \Omega\left(\epsilon\right)

. We show that, in the active learning model, our algorithms achieve a label complexity whose dependence on the error parameter

\epsilon

is polylogarithmic. This provides the first polynomial-time active learning algorithm for learning linear separators in the presence of malicious noise or adversarial label noise.Comment: Contains improved label complexity analysis communicated to us by Steve Hannek

arXiv.org e-Print Archive

FigShare