Search CORE

50 research outputs found

The Power of Localization for Efficiently Learning Linear Separators with Noise

Author: Awasthi Pranjal
Balcan Maria Florina
Long Philip M.
Publication venue
Publication date: 03/06/2018
Field of study

We introduce a new approach for designing computationally efficient learning algorithms that are tolerant to noise, and demonstrate its effectiveness by designing algorithms with improved noise tolerance guarantees for learning linear separators. We consider both the malicious noise model and the adversarial label noise model. For malicious noise, where the adversary can corrupt both the label and the features, we provide a polynomial-time algorithm for learning linear separators in

\Re^d

under isotropic log-concave distributions that can tolerate a nearly information-theoretically optimal noise rate of

\eta = \Omega(\epsilon)

. For the adversarial label noise model, where the distribution over the feature vectors is unchanged, and the overall probability of a noisy label is constrained to be at most

\eta

, we also give a polynomial-time algorithm for learning linear separators in

\Re^d

under isotropic log-concave distributions that can handle a noise rate of

\eta = \Omega\left(\epsilon\right)

. We show that, in the active learning model, our algorithms achieve a label complexity whose dependence on the error parameter

\epsilon

is polylogarithmic. This provides the first polynomial-time active learning algorithm for learning linear separators in the presence of malicious noise or adversarial label noise.Comment: Contains improved label complexity analysis communicated to us by Steve Hannek

arXiv.org e-Print Archive

FigShare

Moment-Matching Polynomials

Author: Klivans Adam
Meka Raghu
Publication venue
Publication date: 04/01/2013
Field of study

We give a new framework for proving the existence of low-degree, polynomial approximators for Boolean functions with respect to broad classes of non-product distributions. Our proofs use techniques related to the classical moment problem and deviate significantly from known Fourier-based methods, which require the underlying distribution to have some product structure. Our main application is the first polynomial-time algorithm for agnostically learning any function of a constant number of halfspaces with respect to any log-concave distribution (for any constant accuracy parameter). This result was not known even for the case of learning the intersection of two halfspaces without noise. Additionally, we show that in the "smoothed-analysis" setting, the above results hold with respect to distributions that have sub-exponential tails, a property satisfied by many natural and well-studied distributions in machine learning. Given that our algorithms can be implemented using Support Vector Machines (SVMs) with a polynomial kernel, these results give a rigorous theoretical explanation as to why many kernel methods work so well in practice

arXiv.org e-Print Archive

CiteSeerX

Learning Kernel-Based Halfspaces with the Zero-One Loss

Author: Shalev-Shwartz Shai
Shamir Ohad
Sridharan Karthik
Publication venue
Publication date: 01/01/2010
Field of study

We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the \emph{zero-one} loss function. Unlike most previous formulations which rely on surrogate convex loss functions (e.g. hinge-loss in SVM and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural zero-one loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time \poly(\exp(L\log(L/\epsilon))), for \emph{any} distribution, where

L

is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most

\epsilon

. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel-based halfspaces in time polynomial in

L

.Comment: This is a full version of the paper appearing in the 23rd International Conference on Learning Theory (COLT 2010). Compared to the previous arXiv version, this version contains some small corrections in the proof of Lemma 3 and in appendix

arXiv.org e-Print Archive

CiteSeerX

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

Author: Bun Mark
Steinke Thomas
Publication venue
Publication date: 08/12/2014
Field of study

Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on

\mathbb{R}^n

in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to

\exp(-|x|^{0.99})

. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only

O(\log(1/\delta))

-wise independence, for a tail probability of

\delta

. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science.Comment: 22 page

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Efficient Learning of Linear Separators under Bounded Noise

Author: Awasthi Pranjal
Balcan Maria-Florina
Haghtalab Nika
Urner Ruth
Publication venue
Publication date: 12/03/2015
Field of study

We study the learnability of linear separators in

\Re^d

in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example

x

with probability

\eta(x) \leq \eta

. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit ball in

\Re^d

, for some constant value of

\eta

. While widely studied in the statistical learning theory community in the context of getting faster convergence rates, computationally efficient algorithms in this model had remained elusive. Our work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. We additionally provide lower bounds showing that popular algorithms such as hinge loss minimization and averaging cannot lead to arbitrarily small excess error under Massart noise, even under the uniform distribution. Our work instead, makes use of a margin based technique developed in the context of active learning. As a result, our algorithm is also an active learning algorithm with label complexity that is only a logarithmic the desired excess error

\epsilon

arXiv.org e-Print Archive

MPG.PuRe