9,092 research outputs found
Learning from the Crowd with Pairwise Comparison
Efficient learning of halfspaces is arguably one of the most important
problems in machine learning and statistics. With the unprecedented growth of
large-scale data sets, it has become ubiquitous to appeal to crowd for data
annotation, and the central problem that attracts a surge of recent interests
is how one can provably learn the underlying halfspace from the highly noisy
crowd feedback. On the other hand, a large body of recent works have been
dedicated to the problem of learning with not only labels, but also pairwise
comparisons, since in many cases it is easier to compare than to label. In this
paper we study the problem of learning halfspaces from the crowd under the
realizable PAC learning setting, and we assume that the crowd workers can
provide (noisy) labels or pairwise comparison tags upon request. We show that
with a powerful boosting framework, together with our novel design of a
filtering process, the overhead (to be defined) of the crowd acts as a
constant, whereas the natural extension of standard approaches to crowd setting
leads to an overhead growing with the size of the data sets
Crowdsourced PAC Learning under Classification Noise
In this paper, we analyze PAC learnability from labels produced by
crowdsourcing. In our setting, unlabeled examples are drawn from a distribution
and labels are crowdsourced from workers who operate under classification
noise, each with their own noise parameter. We develop an end-to-end
crowdsourced PAC learning algorithm that takes unlabeled data points as input
and outputs a trained classifier. Our three-step algorithm incorporates
majority voting, pure-exploration bandits, and noisy-PAC learning. We prove
several guarantees on the number of tasks labeled by workers for PAC learning
in this setting and show that our algorithm improves upon the baseline by
reducing the total number of tasks given to workers. We demonstrate the
robustness of our algorithm by exploring its application to additional
realistic crowdsourcing settings.Comment: 14 page
An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance
Consider a requester who wishes to crowdsource a series of identical binary
labeling tasks to a pool of workers so as to achieve an assured accuracy for
each task, in a cost optimal way. The workers are heterogeneous with unknown
but fixed qualities and their costs are private. The problem is to select for
each task an optimal subset of workers so that the outcome obtained from the
selected workers guarantees a target accuracy level. The problem is a
challenging one even in a non strategic setting since the accuracy of
aggregated label depends on unknown qualities. We develop a novel multi-armed
bandit (MAB) mechanism for solving this problem. First, we propose a framework,
Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained
Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound
on the number of time steps the algorithm chooses a sub-optimal set that
depends on the target accuracy level and true qualities. A more challenging
situation arises when the requester not only has to learn the qualities of the
workers but also elicit their true costs. We modify the CCB-NS algorithm to
obtain an adaptive exploration separated algorithm which we call { \em
Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm
produces an ex-post monotone allocation rule and thus can be transformed into
an ex-post incentive compatible and ex-post individually rational mechanism
that learns the qualities of the workers and guarantees a given target accuracy
level in a cost optimal way. We provide a lower bound on the number of times
any algorithm should select a sub-optimal set and we see that the lower bound
matches our upper bound upto a constant factor. We provide insights on the
practical implementation of this framework through an illustrative example and
we show the efficacy of our algorithms through simulations
Semi-verified PAC Learning from the Crowd with Pairwise Comparisons
We study the problem of crowdsourced PAC learning of threshold functions with
pairwise comparisons. This is a challenging problem and only recently have
query-efficient algorithms been established in the scenario where the majority
of the crowd are perfect. In this work, we investigate the significantly more
challenging case that the majority are incorrect, which in general renders
learning impossible. We show that under the semi-verified model of
Charikar~et~al.~(2017), where we have (limited) access to a trusted oracle who
always returns the correct annotation, it is possible to PAC learn the
underlying hypothesis class while drastically mitigating the labeling cost via
the more easily obtained comparison queries. Orthogonal to recent developments
in semi-verified or list-decodable learning that crucially rely on data
distributional assumptions, our PAC guarantee holds by exploring the wisdom of
the crowd.Comment: v2 incorporates a simpler Filter algorithm, thus the technical
assumption (in v1) is no longer needed. v2 also reorganizes and emphasizes
new algorithm component
- …