19 research outputs found
Finish Them!: Pricing Algorithms for Human Computation
Given a batch of human computation tasks, a commonly ignored aspect is how
the price (i.e., the reward paid to human workers) of these tasks must be set
or varied in order to meet latency or cost constraints. Often, the price is set
up-front and not modified, leading to either a much higher monetary cost than
needed (if the price is set too high), or to a much larger latency than
expected (if the price is set too low). Leveraging a pricing model from prior
work, we develop algorithms to optimally set and then vary price over time in
order to meet a (a) user-specified deadline while minimizing total monetary
cost (b) user-specified monetary budget constraint while minimizing total
elapsed time. We leverage techniques from decision theory (specifically, Markov
Decision Processes) for both these problems, and demonstrate that our
techniques lead to upto 30\% reduction in cost over schemes proposed in prior
work. Furthermore, we develop techniques to speed-up the computation, enabling
users to leverage the price setting algorithms on-the-fly
Cheaper and Better: Selecting Good Workers for Crowdsourcing
Crowdsourcing provides a popular paradigm for data collection at scale. We
study the problem of selecting subsets of workers from a given worker pool to
maximize the accuracy under a budget constraint. One natural question is
whether we should hire as many workers as the budget allows, or restrict on a
small number of top-quality workers. By theoretically analyzing the error rate
of a typical setting in crowdsourcing, we frame the worker selection problem
into a combinatorial optimization problem and propose an algorithm to solve it
efficiently. Empirical results on both simulated and real-world datasets show
that our algorithm is able to select a small number of high-quality workers,
and performs as good as, sometimes even better than, the much larger crowds as
the budget allows
История, направления и некоторые проблемы современных исследований краудсорсинга как научно-практической дисциплины
Сегодня краудсорсинг является широко используемым способом решения многих задач сбора и агрегации данных. В данной работе проведен обзор исследований краудсорсинга как научно-практической дисциплины. Выделены направления исследований и сформулированы некоторые актуальные проблемы данной дисциплины.Today, crowdsourcing became a popular approach for various data collecting and mining tasks. In this work, several modern crowdsourcing studies in different research trends have been discussed and some problems within these trends have been mentioned
Globally Optimal Crowdsourcing Quality Management
We study crowdsourcing quality management, that is, given worker responses to
a set of tasks, our goal is to jointly estimate the true answers for the tasks,
as well as the quality of the workers. Prior work on this problem relies
primarily on applying Expectation-Maximization (EM) on the underlying maximum
likelihood problem to estimate true answers as well as worker quality.
Unfortunately, EM only provides a locally optimal solution rather than a
globally optimal one. Other solutions to the problem (that do not leverage EM)
fail to provide global optimality guarantees as well. In this paper, we focus
on filtering, where tasks require the evaluation of a yes/no predicate, and
rating, where tasks elicit integer scores from a finite domain. We design
algorithms for finding the global optimal estimates of correct task answers and
worker quality for the underlying maximum likelihood problem, and characterize
the complexity of these algorithms. Our algorithms conceptually consider all
mappings from tasks to true answers (typically a very large number), leveraging
two key ideas to reduce, by several orders of magnitude, the number of mappings
under consideration, while preserving optimality. We also demonstrate that
these algorithms often find more accurate estimates than EM-based algorithms.
This paper makes an important contribution towards understanding the inherent
complexity of globally optimal crowdsourcing quality management
Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition
such as Figures, Tables, Definitions, Algo- rithms, etc., which are called Knowledge Cells hereafter. An advanced academic search engine which could take advantage of Knowledge Cells and their various relation- ships to obtain more accurate search results is expected. Further, it’s expected to provide a fine-grained search regard- ing to Knowledge Cells for deep-level information discovery and exploration. Therefore, it is important to identify and extract the Knowledge Cells and their various relationships which are often intrinsic and implicit in articles. With the exponential growth of scientific publications, discovery and acquisition of such useful academic knowledge impose some practical challenges For example, existing algorithmic meth- ods can hardly extend to handle diverse layouts of journals, nor to scale up to process massive documents. As crowd- sourcing has become a powerful paradigm for large scale problem-solving especially for tasks that are difficult for computers but easy for human, we consider the problem of academic knowledge discovery and acquisition as a crowd- sourced database problem and show a hybrid framework to integrate the accuracy of crowdsourcing workers and the speed of automatic algorithms. In this paper, we introduce our current system implementation, a platform for academic knowledge discovery and acquisition (PANDA), as well as some interesting observations and promising future directions.Peer reviewe
Comprehensive and Reliable Crowd Assessment Algorithms
Evaluating workers is a critical aspect of any crowdsourcing system. In this
paper, we devise techniques for evaluating workers by finding confidence
intervals on their error rates. Unlike prior work, we focus on
"conciseness"---that is, giving as tight a confidence interval as possible.
Conciseness is of utmost importance because it allows us to be sure that we
have the best guarantee possible on worker error rate. Also unlike prior work,
we provide techniques that work under very general scenarios, such as when not
all workers have attempted every task (a fairly common scenario in practice),
when tasks have non-boolean responses, and when workers have different biases
for positive and negative tasks. We demonstrate conciseness as well as accuracy
of our confidence intervals by testing them on a variety of conditions and
multiple real-world datasets.Comment: ICDE 201