41 research outputs found
Efficient Learning of Linear Separators under Bounded Noise
We study the learnability of linear separators in in the presence of
bounded (a.k.a Massart) noise. This is a realistic generalization of the random
classification noise model, where the adversary can flip each example with
probability . We provide the first polynomial time algorithm
that can learn linear separators to arbitrarily small excess error in this
noise model under the uniform distribution over the unit ball in , for
some constant value of . While widely studied in the statistical learning
theory community in the context of getting faster convergence rates,
computationally efficient algorithms in this model had remained elusive. Our
work provides the first evidence that one can indeed design algorithms
achieving arbitrarily small excess error in polynomial time under this
realistic noise model and thus opens up a new and exciting line of research.
We additionally provide lower bounds showing that popular algorithms such as
hinge loss minimization and averaging cannot lead to arbitrarily small excess
error under Massart noise, even under the uniform distribution. Our work
instead, makes use of a margin based technique developed in the context of
active learning. As a result, our algorithm is also an active learning
algorithm with label complexity that is only a logarithmic the desired excess
error
An Efficient Tester-Learner for Halfspaces
We give the first efficient algorithm for learning halfspaces in the testable
learning model recently defined by Rubinfeld and Vasilyan (2023). In this
model, a learner certifies that the accuracy of its output hypothesis is near
optimal whenever the training set passes an associated test, and training sets
drawn from some target distribution -- e.g., the Gaussian -- must pass the
test. This model is more challenging than distribution-specific agnostic or
Massart noise models where the learner is allowed to fail arbitrarily if the
distributional assumption does not hold.
We consider the setting where the target distribution is Gaussian (or more
generally any strongly log-concave distribution) in dimensions and the
noise model is either Massart or adversarial (agnostic). For Massart noise, our
tester-learner runs in polynomial time and outputs a hypothesis with
(information-theoretically optimal) error for any
strongly log-concave target distribution. For adversarial noise, our
tester-learner obtains error in polynomial time
when the target distribution is Gaussian; for strongly log-concave
distributions, we obtain in
quasipolynomial time.
Prior work on testable learning ignores the labels in the training set and
checks that the empirical moments of the covariates are close to the moments of
the base distribution. Here we develop new tests of independent interest that
make critical use of the labels and combine them with the moment-matching
approach of Gollakota et al. (2023). This enables us to simulate a variant of
the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using
nonconvex SGD but in the testable learning setting.Comment: 26 pages, 3 figures, Version v2: strengthened the agnostic guarante
On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data
In this paper, we study the problem of PAC learning halfspaces in the
non-interactive local differential privacy model (NLDP). To breach the barrier
of exponential sample complexity, previous results studied a relaxed setting
where the server has access to some additional public but unlabeled data. We
continue in this direction. Specifically, we consider the problem under the
standard setting instead of the large margin setting studied before. Under
different mild assumptions on the underlying data distribution, we propose two
approaches that are based on the Massart noise model and self-supervised
learning and show that it is possible to achieve sample complexities that are
only linear in the dimension and polynomial in other terms for both private and
public data, which significantly improve the previous results. Our methods
could also be used for other private PAC learning problems.Comment: To appear in The 14th Asian Conference on Machine Learning (ACML
2022
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
We study the problem of computationally and label efficient PAC active
learning -dimensional halfspaces with Tsybakov
Noise~\citep{tsybakov2004optimal} under structured unlabeled data
distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any
approximate first-order stationary point of a smooth nonconvex loss function
yields a halfspace with a low excess error guarantee. In light of the above
structural result, we design a nonconvex optimization-based algorithm with a
label complexity of \footnote{In the main body
of this work, we use to hide factors
of the form \polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})}, under the
assumption that the Tsybakov noise parameter , which
narrows down the gap between the label complexities of the previously known
efficient passive or active
algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the
information-theoretic lower bound in this setting.Comment: 29 page
Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability
In this paper we revisit some classic problems on classification under
misspecification. In particular, we study the problem of learning halfspaces
under Massart noise with rate . In a recent work, Diakonikolas,
Goulekakis, and Tzamos resolved a long-standing problem by giving the first
efficient algorithm for learning to accuracy for any
. However, their algorithm outputs a complicated hypothesis,
which partitions space into regions. Here we give a
much simpler algorithm and in the process resolve a number of outstanding open
questions:
(1) We give the first proper learner for Massart halfspaces that achieves
. We also give improved bounds on the sample complexity
achievable by polynomial time algorithms.
(2) Based on (1), we develop a blackbox knowledge distillation procedure to
convert an arbitrarily complex classifier to an equally good proper classifier.
(3) By leveraging a simple but overlooked connection to evolvability, we show
any SQ algorithm requires super-polynomially many queries to achieve
.
Moreover we study generalized linear models where for any odd, monotone, and
Lipschitz function . This family includes the previously mentioned
halfspace models as a special case, but is much richer and includes other
fundamental models like logistic regression. We introduce a challenging new
corruption model that generalizes Massart noise, and give a general algorithm
for learning in this setting. Our algorithms are based on a small set of core
recipes for learning to classify in the presence of misspecification.
Finally we study our algorithm for learning halfspaces under Massart noise
empirically and find that it exhibits some appealing fairness properties.Comment: 51 pages, comments welcom