274 research outputs found
Bandit-Based Task Assignment for Heterogeneous Crowdsourcing
We consider a task assignment problem in crowdsourcing, which is aimed at
collecting as many reliable labels as possible within a limited budget. A
challenge in this scenario is how to cope with the diversity of tasks and the
task-dependent reliability of workers, e.g., a worker may be good at
recognizing the name of sports teams, but not be familiar with cosmetics
brands. We refer to this practical setting as heterogeneous crowdsourcing. In
this paper, we propose a contextual bandit formulation for task assignment in
heterogeneous crowdsourcing, which is able to deal with the
exploration-exploitation trade-off in worker selection. We also theoretically
investigate the regret bounds for the proposed method, and demonstrate its
practical usefulness experimentally
Efficient crowdsourcing of crowd-generated microtasks
Allowing members of the crowd to propose novel microtasks for one another is
an effective way to combine the efficiencies of traditional microtask work with
the inventiveness and hypothesis generation potential of human workers.
However, microtask proposal leads to a growing set of tasks that may overwhelm
limited crowdsourcer resources. Crowdsourcers can employ methods to utilize
their resources efficiently, but algorithmic approaches to efficient
crowdsourcing generally require a fixed task set of known size. In this paper,
we introduce *cost forecasting* as a means for a crowdsourcer to use efficient
crowdsourcing algorithms with a growing set of microtasks. Cost forecasting
allows the crowdsourcer to decide between eliciting new tasks from the crowd or
receiving responses to existing tasks based on whether or not new tasks will
cost less to complete than existing tasks, efficiently balancing resources as
crowdsourcing occurs. Experiments with real and synthetic crowdsourcing data
show that cost forecasting leads to improved accuracy. Accuracy and efficiency
gains for crowd-generated microtasks hold the promise to further leverage the
creativity and wisdom of the crowd, with applications such as generating more
informative and diverse training data for machine learning applications and
improving the performance of user-generated content and question-answering
platforms.Comment: 12 pages, 5 figure
Optimum Statistical Estimation with Strategic Data Sources
We propose an optimum mechanism for providing monetary incentives to the data
sources of a statistical estimator such as linear regression, so that high
quality data is provided at low cost, in the sense that the sum of payments and
estimation error is minimized. The mechanism applies to a broad range of
estimators, including linear and polynomial regression, kernel regression, and,
under some additional assumptions, ridge regression. It also generalizes to
several objectives, including minimizing estimation error subject to budget
constraints. Besides our concrete results for regression problems, we
contribute a mechanism design framework through which to design and analyze
statistical estimators whose examples are supplied by workers with cost for
labeling said examples
Task Selection for Bandit-Based Task Assignment in Heterogeneous Crowdsourcing
Task selection (picking an appropriate labeling task) and worker selection
(assigning the labeling task to a suitable worker) are two major challenges in
task assignment for crowdsourcing. Recently, worker selection has been
successfully addressed by the bandit-based task assignment (BBTA) method, while
task selection has not been thoroughly investigated yet. In this paper, we
experimentally compare several task selection strategies borrowed from active
learning literature, and show that the least confidence strategy significantly
improves the performance of task assignment in crowdsourcing.Comment: arXiv admin note: substantial text overlap with arXiv:1507.0580
T-Crowd: Effective Crowdsourcing for Tabular Data
Crowdsourcing employs human workers to solve computer-hard problems, such as
data cleaning, entity resolution, and sentiment analysis. When crowdsourcing
tabular data, e.g., the attribute values of an entity set, a worker's answers
on the different attributes (e.g., the nationality and age of a celebrity star)
are often treated independently. This assumption is not always true and can
lead to suboptimal crowdsourcing performance. In this paper, we present the
T-Crowd system, which takes into consideration the intricate relationships
among tasks, in order to converge faster to their true values. Particularly,
T-Crowd integrates each worker's answers on different attributes to effectively
learn his/her trustworthiness and the true data values. The attribute
relationship information is also used to guide task allocation to workers.
Finally, T-Crowd seamlessly supports categorical and continuous attributes,
which are the two main datatypes found in typical databases. Our extensive
experiments on real and synthetic datasets show that T-Crowd outperforms
state-of-the-art methods in terms of truth inference and reducing the cost of
crowdsourcing
Cheaper and Better: Selecting Good Workers for Crowdsourcing
Crowdsourcing provides a popular paradigm for data collection at scale. We
study the problem of selecting subsets of workers from a given worker pool to
maximize the accuracy under a budget constraint. One natural question is
whether we should hire as many workers as the budget allows, or restrict on a
small number of top-quality workers. By theoretically analyzing the error rate
of a typical setting in crowdsourcing, we frame the worker selection problem
into a combinatorial optimization problem and propose an algorithm to solve it
efficiently. Empirical results on both simulated and real-world datasets show
that our algorithm is able to select a small number of high-quality workers,
and performs as good as, sometimes even better than, the much larger crowds as
the budget allows
- …