62 research outputs found
Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting
In this paper, we consider the problem of actively learning a linear
classifier through query synthesis where the learner can construct artificial
queries in order to estimate the true decision boundaries. This problem has
recently gained a lot of interest in automated science and adversarial reverse
engineering for which only heuristic algorithms are known. In such
applications, queries can be constructed de novo to elicit information (e.g.,
automated science) or to evade detection with minimal cost (e.g., adversarial
reverse engineering). We develop a general framework, called dimension coupling
(DC), that 1) reduces a d-dimensional learning problem to d-1 low dimensional
sub-problems, 2) solves each sub-problem efficiently, 3) appropriately
aggregates the results and outputs a linear classifier, and 4) provides a
theoretical guarantee for all possible schemes of aggregation. The proposed
method is proved resilient to noise. We show that the DC framework avoids the
curse of dimensionality: its computational complexity scales linearly with the
dimension. Moreover, we show that the query complexity of DC is near optimal
(within a constant factor of the optimum algorithm). To further support our
theoretical analysis, we compare the performance of DC with the existing work.
We observe that DC consistently outperforms the prior arts in terms of query
complexity while often running orders of magnitude faster.Comment: Accepted by AAAI 201
Active classification with comparison queries
We study an extension of active learning in which the learning algorithm may
ask the annotator to compare the distances of two examples from the boundary of
their label-class. For example, in a recommendation system application (say for
restaurants), the annotator may be asked whether she liked or disliked a
specific restaurant (a label query); or which one of two restaurants did she
like more (a comparison query).
We focus on the class of half spaces, and show that under natural
assumptions, such as large margin or bounded bit-description of the input
examples, it is possible to reveal all the labels of a sample of size using
approximately queries. This implies an exponential improvement over
classical active learning, where only label queries are allowed. We complement
these results by showing that if any of these assumptions is removed then, in
the worst case, queries are required.
Our results follow from a new general framework of active learning with
additional queries. We identify a combinatorial dimension, called the
\emph{inference dimension}, that captures the query complexity when each
additional query is determined by examples (such as comparison queries,
each of which is determined by the two compared examples). Our results for half
spaces follow by bounding the inference dimension in the cases discussed above.Comment: 23 pages (not including references), 1 figure. The new version
contains a minor fix in the proof of Lemma 4.
- …