41 research outputs found

    Efficient Learning of Linear Separators under Bounded Noise

    Full text link
    We study the learnability of linear separators in ℜd\Re^d in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example xx with probability η(x)≤η\eta(x) \leq \eta. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit ball in ℜd\Re^d, for some constant value of η\eta. While widely studied in the statistical learning theory community in the context of getting faster convergence rates, computationally efficient algorithms in this model had remained elusive. Our work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. We additionally provide lower bounds showing that popular algorithms such as hinge loss minimization and averaging cannot lead to arbitrarily small excess error under Massart noise, even under the uniform distribution. Our work instead, makes use of a margin based technique developed in the context of active learning. As a result, our algorithm is also an active learning algorithm with label complexity that is only a logarithmic the desired excess error ϵ\epsilon

    An Efficient Tester-Learner for Halfspaces

    Full text link
    We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution -- e.g., the Gaussian -- must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is Gaussian (or more generally any strongly log-concave distribution) in dd dimensions and the noise model is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error opt+ϵ\mathsf{opt} + \epsilon for any strongly log-concave target distribution. For adversarial noise, our tester-learner obtains error O(opt)+ϵO(\mathsf{opt}) + \epsilon in polynomial time when the target distribution is Gaussian; for strongly log-concave distributions, we obtain O~(opt)+ϵ\tilde{O}(\mathsf{opt}) + \epsilon in quasipolynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. (2023). This enables us to simulate a variant of the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using nonconvex SGD but in the testable learning setting.Comment: 26 pages, 3 figures, Version v2: strengthened the agnostic guarante

    On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data

    Full text link
    In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.Comment: To appear in The 14th Asian Conference on Machine Learning (ACML 2022

    Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

    Full text link
    We study the problem of computationally and label efficient PAC active learning dd-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of O~(d(1ϵ)8−6α3α−1)\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1}})\footnote{In the main body of this work, we use O~(⋅),Θ~(⋅)\tilde{O}(\cdot), \tilde{\Theta}(\cdot) to hide factors of the form \polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})}, under the assumption that the Tsybakov noise parameter α∈(13,1]\alpha \in (\frac13, 1], which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.Comment: 29 page

    Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability

    Full text link
    In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate η\eta. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy η+ϵ\eta + \epsilon for any ϵ>0\epsilon > 0. However, their algorithm outputs a complicated hypothesis, which partitions space into poly(d,1/ϵ)\text{poly}(d,1/\epsilon) regions. Here we give a much simpler algorithm and in the process resolve a number of outstanding open questions: (1) We give the first proper learner for Massart halfspaces that achieves η+ϵ\eta + \epsilon. We also give improved bounds on the sample complexity achievable by polynomial time algorithms. (2) Based on (1), we develop a blackbox knowledge distillation procedure to convert an arbitrarily complex classifier to an equally good proper classifier. (3) By leveraging a simple but overlooked connection to evolvability, we show any SQ algorithm requires super-polynomially many queries to achieve OPT+ϵ\mathsf{OPT} + \epsilon. Moreover we study generalized linear models where E[Y∣X]=σ(⟨w∗,X⟩)\mathbb{E}[Y|\mathbf{X}] = \sigma(\langle \mathbf{w}^*, \mathbf{X}\rangle) for any odd, monotone, and Lipschitz function σ\sigma. This family includes the previously mentioned halfspace models as a special case, but is much richer and includes other fundamental models like logistic regression. We introduce a challenging new corruption model that generalizes Massart noise, and give a general algorithm for learning in this setting. Our algorithms are based on a small set of core recipes for learning to classify in the presence of misspecification. Finally we study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties.Comment: 51 pages, comments welcom
    corecore