6 research outputs found

    Learning Kernel-Based Halfspaces with the Zero-One Loss

    Full text link
    We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the \emph{zero-one} loss function. Unlike most previous formulations which rely on surrogate convex loss functions (e.g. hinge-loss in SVM and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural zero-one loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time \poly(\exp(L\log(L/\epsilon))), for \emph{any} distribution, where LL is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most ϵ\epsilon. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel-based halfspaces in time polynomial in LL.Comment: This is a full version of the paper appearing in the 23rd International Conference on Learning Theory (COLT 2010). Compared to the previous arXiv version, this version contains some small corrections in the proof of Lemma 3 and in appendix

    Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin

    Full text link
    We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning dd-dimensional halfspaces on the unit ball within misclassification error α⋅OPTγ+ϵ\alpha \cdot \mathrm{OPT}_{\gamma} + \epsilon, where OPTγ\mathrm{OPT}_{\gamma} is the optimal γ\gamma-margin error rate and α≥1\alpha \geq 1 is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio α≥1\alpha \geq 1, that are nearly-matching for a range of parameters. Specifically, for the natural setting that α\alpha is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an α=1.01\alpha = 1.01-approximate proper learner that uses O(1/(ϵ2γ2))O(1/(\epsilon^2\gamma^2)) samples (which is optimal) and runs in time poly(d/ϵ)⋅2O~(1/γ2)\mathrm{poly}(d/\epsilon) \cdot 2^{\tilde{O}(1/\gamma^2)}. On the negative side, we show that {\em any} constant factor approximate proper learner has runtime poly(d/ϵ)⋅2(1/γ)2−o(1)\mathrm{poly}(d/\epsilon) \cdot 2^{(1/\gamma)^{2-o(1)}}, assuming the Exponential Time Hypothesis
    corecore