23,188 research outputs found
Optimum Statistical Estimation with Strategic Data Sources
We propose an optimum mechanism for providing monetary incentives to the data
sources of a statistical estimator such as linear regression, so that high
quality data is provided at low cost, in the sense that the sum of payments and
estimation error is minimized. The mechanism applies to a broad range of
estimators, including linear and polynomial regression, kernel regression, and,
under some additional assumptions, ridge regression. It also generalizes to
several objectives, including minimizing estimation error subject to budget
constraints. Besides our concrete results for regression problems, we
contribute a mechanism design framework through which to design and analyze
statistical estimators whose examples are supplied by workers with cost for
labeling said examples
Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation
A human computation system can be viewed as a distributed system in which the
processors are humans, called workers. Such systems harness the cognitive power
of a group of workers connected to the Internet to execute relatively simple
tasks, whose solutions, once grouped, solve a problem that systems equipped
with only machines could not solve satisfactorily. Examples of such systems are
Amazon Mechanical Turk and the Zooniverse platform. A human computation
application comprises a group of tasks, each of them can be performed by one
worker. Tasks might have dependencies among each other. In this study, we
propose a theoretical framework to analyze such type of application from a
distributed systems point of view. Our framework is established on three
dimensions that represent different perspectives in which human computation
applications can be approached: quality-of-service requirements, design and
management strategies, and human aspects. By using this framework, we review
human computation in the perspective of programmers seeking to improve the
design of human computation applications and managers seeking to increase the
effectiveness of human computation infrastructures in running such
applications. In doing so, besides integrating and organizing what has been
done in this direction, we also put into perspective the fact that the human
aspects of the workers in such systems introduce new challenges in terms of,
for example, task assignment, dependency management, and fault prevention and
tolerance. We discuss how they are related to distributed systems and other
areas of knowledge.Comment: 3 figures, 1 tabl
Optimal Crowdsourced Classification with a Reject Option in the Presence of Spammers
We explore the design of an effective crowdsourcing system for an -ary
classification task. Crowd workers complete simple binary microtasks whose
results are aggregated to give the final decision. We consider the scenario
where the workers have a reject option so that they are allowed to skip
microtasks when they are unable to or choose not to respond to binary
microtasks. We present an aggregation approach using a weighted majority voting
rule, where each worker's response is assigned an optimized weight to maximize
crowd's classification performance.Comment: submitted to ICASSP 201
Crowdsourcing Paper Screening in Systematic Literature Reviews
Literature reviews allow scientists to stand on the shoulders of giants,
showing promising directions, summarizing progress, and pointing out existing
challenges in research. At the same time conducting a systematic literature
review is a laborious and consequently expensive process. In the last decade,
there have a few studies on crowdsourcing in literature reviews. This paper
explores the feasibility of crowdsourcing for facilitating the literature
review process in terms of results, time and effort, as well as to identify
which crowdsourcing strategies provide the best results based on the budget
available. In particular we focus on the screening phase of the literature
review process and we contribute and assess methods for identifying the size of
tests, labels required per paper, and classification functions as well as
methods to split the crowdsourcing process in phases to improve results.
Finally, we present our findings based on experiments run on Crowdflower
- âŠ