1,522 research outputs found
Optimal Inference in Crowdsourced Classification via Belief Propagation
Crowdsourcing systems are popular for solving large-scale labelling tasks
with low-paid workers. We study the problem of recovering the true labels from
the possibly erroneous crowdsourced labels under the popular Dawid-Skene model.
To address this inference problem, several algorithms have recently been
proposed, but the best known guarantee is still significantly larger than the
fundamental limit. We close this gap by introducing a tighter lower bound on
the fundamental limit and proving that Belief Propagation (BP) exactly matches
this lower bound. The guaranteed optimality of BP is the strongest in the sense
that it is information-theoretically impossible for any other algorithm to
correctly label a larger fraction of the tasks. Experimental results suggest
that BP is close to optimal for all regimes considered and improves upon
competing state-of-the-art algorithms.Comment: This article is partially based on preliminary results published in
the proceeding of the 33rd International Conference on Machine Learning (ICML
2016
Iterative Bayesian Learning for Crowdsourced Regression
Crowdsourcing platforms emerged as popular venues for purchasing human
intelligence at low cost for large volume of tasks. As many low-paid workers
are prone to give noisy answers, a common practice is to add redundancy by
assigning multiple workers to each task and then simply average out these
answers. However, to fully harness the wisdom of the crowd, one needs to learn
the heterogeneous quality of each worker. We resolve this fundamental challenge
in crowdsourced regression tasks, i.e., the answer takes continuous labels,
where identifying good or bad workers becomes much more non-trivial compared to
a classification setting of discrete labels. In particular, we introduce a
Bayesian iterative scheme and show that it provably achieves the optimal mean
squared error. Our evaluations on synthetic and real-world datasets support our
theoretical results and show the superiority of the proposed scheme
- …