Search CORE

6 research outputs found

Learning Kernel-Based Halfspaces with the Zero-One Loss

Author: Shalev-Shwartz Shai
Shamir Ohad
Sridharan Karthik
Publication venue
Publication date: 01/01/2010
Field of study

We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the \emph{zero-one} loss function. Unlike most previous formulations which rely on surrogate convex loss functions (e.g. hinge-loss in SVM and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural zero-one loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time \poly(\exp(L\log(L/\epsilon))), for \emph{any} distribution, where

L

is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most

\epsilon

. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel-based halfspaces in time polynomial in

L

.Comment: This is a full version of the paper appearing in the 23rd International Conference on Learning Theory (COLT 2010). Compared to the previous arXiv version, this version contains some small corrections in the proof of Lemma 3 and in appendix

arXiv.org e-Print Archive

CiteSeerX

Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin

Author: Diakonikolas Ilias
Kane Daniel M.
Manurangsi Pasin
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning

d

-dimensional halfspaces on the unit ball within misclassification error

\alpha \cdot \mathrm{OPT}_{\gamma} + \epsilon

, where

\mathrm{OPT}_{\gamma}

is the optimal

\gamma

-margin error rate and

\alpha \geq 1

is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio

\alpha \geq 1

, that are nearly-matching for a range of parameters. Specifically, for the natural setting that

\alpha

is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an

\alpha = 1.01

-approximate proper learner that uses

O(1/(\epsilon^2\gamma^2))

samples (which is optimal) and runs in time

\mathrm{poly}(d/\epsilon) \cdot 2^{\tilde{O}(1/\gamma^2)}

. On the negative side, we show that {\em any} constant factor approximate proper learner has runtime

\mathrm{poly}(d/\epsilon) \cdot 2^{(1/\gamma)^{2-o(1)}}

, assuming the Exponential Time Hypothesis

arXiv.org e-Print Archive

eScholarship - University of California