Search CORE

9,092 research outputs found

Learning from the Crowd with Pairwise Comparison

Author: Shen Jie
Zeng Shiwei
Publication venue
Publication date: 22/12/2020
Field of study

Efficient learning of halfspaces is arguably one of the most important problems in machine learning and statistics. With the unprecedented growth of large-scale data sets, it has become ubiquitous to appeal to crowd for data annotation, and the central problem that attracts a surge of recent interests is how one can provably learn the underlying halfspace from the highly noisy crowd feedback. On the other hand, a large body of recent works have been dedicated to the problem of learning with not only labels, but also pairwise comparisons, since in many cases it is easier to compare than to label. In this paper we study the problem of learning halfspaces from the crowd under the realizable PAC learning setting, and we assume that the crowd workers can provide (noisy) labels or pairwise comparison tags upon request. We show that with a powerful boosting framework, together with our novel design of a filtering process, the overhead (to be defined) of the crowd acts as a constant, whereas the natural extension of standard approaches to crowd setting leads to an overhead growing with the size of the data sets

arXiv.org e-Print Archive

Crowdsourced PAC Learning under Classification Noise

Author: Heinecke Shelby
Reyzin Lev
Publication venue
Publication date: 12/02/2019
Field of study

In this paper, we analyze PAC learnability from labels produced by crowdsourcing. In our setting, unlabeled examples are drawn from a distribution and labels are crowdsourced from workers who operate under classification noise, each with their own noise parameter. We develop an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier. Our three-step algorithm incorporates majority voting, pure-exploration bandits, and noisy-PAC learning. We prove several guarantees on the number of tasks labeled by workers for PAC learning in this setting and show that our algorithm improves upon the baseline by reducing the total number of tasks given to workers. We demonstrate the robustness of our algorithm by exploring its application to additional realistic crowdsourcing settings.Comment: 14 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance

Author: Bhat Satyanath
Gujar Sujit
Jain Shweta
Narahari Y.
Zoeter Onno
Publication venue
Publication date: 17/06/2015
Field of study

Consider a requester who wishes to crowdsource a series of identical binary labeling tasks to a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed qualities and their costs are private. The problem is to select for each task an optimal subset of workers so that the outcome obtained from the selected workers guarantees a target accuracy level. The problem is a challenging one even in a non strategic setting since the accuracy of aggregated label depends on unknown qualities. We develop a novel multi-armed bandit (MAB) mechanism for solving this problem. First, we propose a framework, Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound on the number of time steps the algorithm chooses a sub-optimal set that depends on the target accuracy level and true qualities. A more challenging situation arises when the requester not only has to learn the qualities of the workers but also elicit their true costs. We modify the CCB-NS algorithm to obtain an adaptive exploration separated algorithm which we call { \em Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm produces an ex-post monotone allocation rule and thus can be transformed into an ex-post incentive compatible and ex-post individually rational mechanism that learns the qualities of the workers and guarantees a given target accuracy level in a cost optimal way. We provide a lower bound on the number of times any algorithm should select a sub-optimal set and we see that the lower bound matches our upper bound upto a constant factor. We provide insights on the practical implementation of this framework through an illustrative example and we show the efficacy of our algorithms through simulations

arXiv.org e-Print Archive

CiteSeerX

Semi-verified PAC Learning from the Crowd with Pairwise Comparisons

Author: Shen Jie
Zeng Shiwei
Publication venue
Publication date: 04/02/2022
Field of study

We study the problem of crowdsourced PAC learning of threshold functions with pairwise comparisons. This is a challenging problem and only recently have query-efficient algorithms been established in the scenario where the majority of the crowd are perfect. In this work, we investigate the significantly more challenging case that the majority are incorrect, which in general renders learning impossible. We show that under the semi-verified model of Charikar~et~al.~(2017), where we have (limited) access to a trusted oracle who always returns the correct annotation, it is possible to PAC learn the underlying hypothesis class while drastically mitigating the labeling cost via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or list-decodable learning that crucially rely on data distributional assumptions, our PAC guarantee holds by exploring the wisdom of the crowd.Comment: v2 incorporates a simpler Filter algorithm, thus the technical assumption (in v1) is no longer needed. v2 also reorganizes and emphasizes new algorithm component

arXiv.org e-Print Archive