7 research outputs found
An Adaptive Strategy for Active Learning with Smooth Decision Boundary
We present the first adaptive strategy for active learning in the setting of
classification with smooth decision boundary. The problem of adaptivity (to
unknown distributional parameters) has remained opened since the seminal work
of Castro and Nowak (2007), which first established (active learning) rates for
this setting. While some recent advances on this problem establish adaptive
rates in the case of univariate data, adaptivity in the more practical setting
of multivariate data has so far remained elusive. Combining insights from
various recent works, we show that, for the multivariate case, a careful
reduction to univariate-adaptive strategies yield near-optimal rates without
prior knowledge of distributional parameters
The Power of Localization for Efficiently Learning Linear Separators with Noise
We introduce a new approach for designing computationally efficient learning
algorithms that are tolerant to noise, and demonstrate its effectiveness by
designing algorithms with improved noise tolerance guarantees for learning
linear separators.
We consider both the malicious noise model and the adversarial label noise
model. For malicious noise, where the adversary can corrupt both the label and
the features, we provide a polynomial-time algorithm for learning linear
separators in under isotropic log-concave distributions that can
tolerate a nearly information-theoretically optimal noise rate of . For the adversarial label noise model, where the
distribution over the feature vectors is unchanged, and the overall probability
of a noisy label is constrained to be at most , we also give a
polynomial-time algorithm for learning linear separators in under
isotropic log-concave distributions that can handle a noise rate of .
We show that, in the active learning model, our algorithms achieve a label
complexity whose dependence on the error parameter is
polylogarithmic. This provides the first polynomial-time active learning
algorithm for learning linear separators in the presence of malicious noise or
adversarial label noise.Comment: Contains improved label complexity analysis communicated to us by
Steve Hannek
Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy
We describe a framework for designing efficient active learning algorithms
that are tolerant to random classification noise and are
differentially-private. The framework is based on active learning algorithms
that are statistical in the sense that they rely on estimates of expectations
of functions of filtered random examples. It builds on the powerful statistical
query framework of Kearns (1993).
We show that any efficient active statistical learning algorithm can be
automatically converted to an efficient active learning algorithm which is
tolerant to random classification noise as well as other forms of
"uncorrelated" noise. The complexity of the resulting algorithms has
information-theoretically optimal quadratic dependence on , where
is the noise rate.
We show that commonly studied concept classes including thresholds,
rectangles, and linear separators can be efficiently actively learned in our
framework. These results combined with our generic conversion lead to the first
computationally-efficient algorithms for actively learning some of these
concept classes in the presence of random classification noise that provide
exponential improvement in the dependence on the error over their
passive counterparts. In addition, we show that our algorithms can be
automatically converted to efficient active differentially-private algorithms.
This leads to the first differentially-private active learning algorithms with
exponential label savings over the passive case.Comment: Extended abstract appears in NIPS 201
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
Surrogate Losses in Passive and Active Learning
Active learning is a type of sequential design for supervised machine
learning, in which the learning algorithm sequentially requests the labels of
selected instances from a large pool of unlabeled data points. The objective is
to produce a classifier of relatively low risk, as measured under the 0-1 loss,
ideally using fewer label requests than the number of random labeled data
points sufficient to achieve the same. This work investigates the potential
uses of surrogate loss functions in the context of active learning.
Specifically, it presents an active learning algorithm based on an arbitrary
classification-calibrated surrogate loss function, along with an analysis of
the number of label requests sufficient for the classifier returned by the
algorithm to achieve a given risk under the 0-1 loss. Interestingly, these
results cannot be obtained by simply optimizing the surrogate risk via active
learning to an extent sufficient to provide a guarantee on the 0-1 loss, as is
common practice in the analysis of surrogate losses for passive learning. Some
of the results have additional implications for the use of surrogate losses in
passive learning
Learning with non-Standard Supervision
Machine learning has enjoyed astounding practical
success in a wide range of applications in recent
years-practical success that often hurries ahead of our
theoretical understanding. The standard framework for machine
learning theory assumes full supervision, that is, training data
consists of correctly labeled iid examples from the same task
that the learned classifier is supposed to be applied to.
However, many practical applications successfully make use of
the sheer abundance of data that is currently produced. Such
data may not be labeled or may be collected from various
sources.
The focus of this thesis is to provide theoretical analysis of
machine learning regimes where the learner is given such
(possibly large amounts) of non-perfect training data. In
particular, we investigate the benefits and limitations of
learning with unlabeled data in semi-supervised learning and
active learning as well as benefits and limitations of learning
from data that has been generated by a task that is different
from the target task (domain adaptation learning).
For all three settings, we propose
Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we
discuss our suggested notion by comparing it to other common
data assumptions