12 research outputs found
An Efficient Tester-Learner for Halfspaces
We give the first efficient algorithm for learning halfspaces in the testable
learning model recently defined by Rubinfeld and Vasilyan (2023). In this
model, a learner certifies that the accuracy of its output hypothesis is near
optimal whenever the training set passes an associated test, and training sets
drawn from some target distribution -- e.g., the Gaussian -- must pass the
test. This model is more challenging than distribution-specific agnostic or
Massart noise models where the learner is allowed to fail arbitrarily if the
distributional assumption does not hold.
We consider the setting where the target distribution is Gaussian (or more
generally any strongly log-concave distribution) in dimensions and the
noise model is either Massart or adversarial (agnostic). For Massart noise, our
tester-learner runs in polynomial time and outputs a hypothesis with
(information-theoretically optimal) error for any
strongly log-concave target distribution. For adversarial noise, our
tester-learner obtains error in polynomial time
when the target distribution is Gaussian; for strongly log-concave
distributions, we obtain in
quasipolynomial time.
Prior work on testable learning ignores the labels in the training set and
checks that the empirical moments of the covariates are close to the moments of
the base distribution. Here we develop new tests of independent interest that
make critical use of the labels and combine them with the moment-matching
approach of Gollakota et al. (2023). This enables us to simulate a variant of
the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using
nonconvex SGD but in the testable learning setting.Comment: 26 pages, 3 figures, Version v2: strengthened the agnostic guarante
A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity
A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the
study of \emph{testable learning}, where the goal is to replace hard-to-verify
distributional assumptions (such as Gaussianity) with efficiently testable ones
and to require that the learner succeed whenever the unknown distribution
passes the corresponding test. In this model, they gave an efficient algorithm
for learning halfspaces under testable assumptions that are provably satisfied
by Gaussians.
In this paper we give a powerful new approach for developing algorithms for
testable learning using tools from moment matching and metric distances in
probability. We obtain efficient testable learners for any concept class that
admits low-degree \emph{sandwiching polynomials}, capturing most important
examples for which we have ordinary agnostic learners. We recover the results
of Rubinfeld and Vasilyan as a corollary of our techniques while achieving
improved, near-optimal sample complexity bounds for a broad range of concept
classes and distributions.
Surprisingly, we show that the information-theoretic sample complexity of
testable learning is tightly characterized by the Rademacher complexity of the
concept class, one of the most well-studied measures in statistical learning
theory. In particular, uniform convergence is necessary and sufficient for
testable learning. This leads to a fundamental separation from (ordinary)
distribution-specific agnostic learning, where uniform convergence is
sufficient but not necessary.Comment: 34 page
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
We study the problem of computationally and label efficient PAC active
learning -dimensional halfspaces with Tsybakov
Noise~\citep{tsybakov2004optimal} under structured unlabeled data
distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any
approximate first-order stationary point of a smooth nonconvex loss function
yields a halfspace with a low excess error guarantee. In light of the above
structural result, we design a nonconvex optimization-based algorithm with a
label complexity of \footnote{In the main body
of this work, we use to hide factors
of the form \polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})}, under the
assumption that the Tsybakov noise parameter , which
narrows down the gap between the label complexities of the previously known
efficient passive or active
algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the
information-theoretic lower bound in this setting.Comment: 29 page