31 research outputs found
An Efficient Tester-Learner for Halfspaces
We give the first efficient algorithm for learning halfspaces in the testable
learning model recently defined by Rubinfeld and Vasilyan (2023). In this
model, a learner certifies that the accuracy of its output hypothesis is near
optimal whenever the training set passes an associated test, and training sets
drawn from some target distribution -- e.g., the Gaussian -- must pass the
test. This model is more challenging than distribution-specific agnostic or
Massart noise models where the learner is allowed to fail arbitrarily if the
distributional assumption does not hold.
We consider the setting where the target distribution is Gaussian (or more
generally any strongly log-concave distribution) in dimensions and the
noise model is either Massart or adversarial (agnostic). For Massart noise, our
tester-learner runs in polynomial time and outputs a hypothesis with
(information-theoretically optimal) error for any
strongly log-concave target distribution. For adversarial noise, our
tester-learner obtains error in polynomial time
when the target distribution is Gaussian; for strongly log-concave
distributions, we obtain in
quasipolynomial time.
Prior work on testable learning ignores the labels in the training set and
checks that the empirical moments of the covariates are close to the moments of
the base distribution. Here we develop new tests of independent interest that
make critical use of the labels and combine them with the moment-matching
approach of Gollakota et al. (2023). This enables us to simulate a variant of
the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using
nonconvex SGD but in the testable learning setting.Comment: 26 pages, 3 figures, Version v2: strengthened the agnostic guarante
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
We study the problem of computationally and label efficient PAC active
learning -dimensional halfspaces with Tsybakov
Noise~\citep{tsybakov2004optimal} under structured unlabeled data
distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any
approximate first-order stationary point of a smooth nonconvex loss function
yields a halfspace with a low excess error guarantee. In light of the above
structural result, we design a nonconvex optimization-based algorithm with a
label complexity of \footnote{In the main body
of this work, we use to hide factors
of the form \polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})}, under the
assumption that the Tsybakov noise parameter , which
narrows down the gap between the label complexities of the previously known
efficient passive or active
algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the
information-theoretic lower bound in this setting.Comment: 29 page