Search CORE

3 research outputs found

BUOCA: Budget-Optimized Crowd Worker Allocation

Author: Betke Margrit
Guo Lei
Ishwar Prakash
Lai Sha
Mays Kate K.
Sameki Mehrnoosh
Publication venue
Publication date: 11/01/2019
Field of study

Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the allocations of the number of workers to a task can be computed optimally based on task features alone, without using worker profiles. Our target tasks are delineating cells in microscopy images and analyzing the sentiment toward the 2016 U.S. presidential candidates in tweets. We first propose an algorithm that computes budget-optimized crowd worker allocation (BUOCA). We next train a machine learning system (BUOCA-ML) that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling. We show that the computed allocation can yield large savings in the crowdsourcing budget (up to 49 percent points) while maintaining labeling accuracy. Finally, we envisage a human-machine system for performing budget-optimized data analysis at a scale beyond the feasibility of crowdsourcing.First author draf

Boston University Institutional Repository (OpenBU)

BUOCA: Budget-Optimized Crowd Worker Allocation

Author: Betke Margrit
Guo Lei
Ishwar Prakash
Lai Sha
Mays Kate K.
Sameki Mehrnoosh
Publication venue
Publication date: 11/01/2019
Field of study

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

On the efficiency of data collection for crowdsourced classification

Author: Jennings NR
Manino E
Tran-Thanh L
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 13/07/2018
Field of study

The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time

Crossref

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository