14 research outputs found
Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation
A human computation system can be viewed as a distributed system in which the
processors are humans, called workers. Such systems harness the cognitive power
of a group of workers connected to the Internet to execute relatively simple
tasks, whose solutions, once grouped, solve a problem that systems equipped
with only machines could not solve satisfactorily. Examples of such systems are
Amazon Mechanical Turk and the Zooniverse platform. A human computation
application comprises a group of tasks, each of them can be performed by one
worker. Tasks might have dependencies among each other. In this study, we
propose a theoretical framework to analyze such type of application from a
distributed systems point of view. Our framework is established on three
dimensions that represent different perspectives in which human computation
applications can be approached: quality-of-service requirements, design and
management strategies, and human aspects. By using this framework, we review
human computation in the perspective of programmers seeking to improve the
design of human computation applications and managers seeking to increase the
effectiveness of human computation infrastructures in running such
applications. In doing so, besides integrating and organizing what has been
done in this direction, we also put into perspective the fact that the human
aspects of the workers in such systems introduce new challenges in terms of,
for example, task assignment, dependency management, and fault prevention and
tolerance. We discuss how they are related to distributed systems and other
areas of knowledge.Comment: 3 figures, 1 tabl
Incentivizing Truthful Responses with the Logarithmic Peer Truth Serum
We consider a participatory sensing scenario where a group of private sensors observes the same phenomenon, such as air pollution. We design a novel payment mechanism that incentivizes participation and honest behavior using the peer prediction approach, i.e. by comparing sensors reports. As it is the case with other peer prediction methods, the mechanism admits uninformed reporting equilibria. However, in the novel mechanism these equilibria result in worse payoff than truthful reporting
How many crowdsourced workers should a requester hire?
Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a number of workers to work on a set of similar tasks. After completing the tasks, each worker reports back outputs. The requester then aggregates the reported outputs to obtain aggregate outputs. A crucial question that arises during this process is: how many crowd workers should a requester hire? In this paper, we investigate from an empirical perspective the optimal number of workers a requester should hire when crowdsourcing tasks, with a particular focus on the crowdsourcing platform Amazon Mechanical Turk. Specifically, we report the results of three studies involving different tasks and payment schemes. We find that both the expected error in the aggregate outputs as well as the risk of a poor combination of workers decrease as the number of workers increases. Surprisingly, we find that the optimal number of workers a requester should hire for each task is around 10 to 11, no matter the underlying task and payment scheme. To derive such a result, we employ a principled analysis based on bootstrapping and segmented linear regression. Besides the above result, we also find that overall top-performing workers are more consistent across multiple tasks than other workers. Our results thus contribute to a better understanding of, and provide new insights into, how to design more effective crowdsourcing processes
Incentive Schemes for Participatory Sensing
We consider a participatory sensing scenario where a group of private sensors observes the same phenomenon, such as air pollution. Since sensors need to be installed and maintained, owners of sensors are inclined to provide inaccurate or random data. We design a novel payment mechanism that incentivizes honest behavior by scoring sensors based on the quality of their reports. The basic principle follows the standard Bayesian Truth Serum (BTS) paradigm, where highest rewards are obtained for reports that are surprisingly common. The mechanism, however, eliminates the main drawback of the BTS in a sensing scenario since it does not require sensors to report predictions regarding the overall distribution of sensors' measurements. As it is the case with other peer prediction methods, the mechanism admits uninformed equilibria. However, in the novel mechanism these equilibria result in worse payoff than truthful reporting
Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions
Crowdsourcing enables one to leverage on the intelligence and wisdom of
potentially large groups of individuals toward solving problems. Common
problems approached with crowdsourcing are labeling images, translating or
transcribing text, providing opinions or ideas, and similar - all tasks that
computers are not good at or where they may even fail altogether. The
introduction of humans into computations and/or everyday work, however, also
poses critical, novel challenges in terms of quality control, as the crowd is
typically composed of people with unknown and very diverse abilities, skills,
interests, personal objectives and technological resources. This survey studies
quality in the context of crowdsourcing along several dimensions, so as to
define and characterize it and to understand the current state of the art.
Specifically, this survey derives a quality model for crowdsourcing tasks,
identifies the methods and techniques that can be used to assess the attributes
of the model, and the actions and strategies that help prevent and mitigate
quality problems. An analysis of how these features are supported by the state
of the art further identifies open issues and informs an outlook on hot future
research directions.Comment: 40 pages main paper, 5 pages appendi
Incentives for Effort in Crowdsourcing using the Peer Truth Serum
Crowdsourcing is widely proposed as a method to solve large variety of judgement tasks, such as classifying website content, peer grading in online courses, or collecting real-world data. As the data reported by workers cannot be verified, there is a tendency to report random data without actually solving the task. This can be countered by making the reward for an answer depend on its consistency with answers given by other workers, an approach called {\em peer consistency}. However, it is obvious that the best strategy in such schemes is for all workers to report the same answer without solving the task. Dasgupta and Ghosh (WWW 2013) show that in some cases exerting high effort can be encouraged in the highest-paying equilibrium. In this paper we present a general mechanism that implements this idea and is applicable to most crowdsourcing settings. Furthermore, we experimentally test the novel mechanism, and validate its theoretical properties
Enhancing Reliability Using Peer Consistency Evaluation in Human Computation
Peer consistency evaluation is often used in games with a purpose (GWAP) to evaluate workers using outputs of other workers without using gold standard answers. Despite its popularity, the reliability of peer consistency evaluation has never been systematically tested to show how it can be used as a general evaluation method in human computation systems. We present experimental results that show that human computation systems using peer consistency evaluation can lead to outcomes that are even better than those that evaluate workers using gold standard answers. We also show that even without evaluation, simply telling the workers that their answers will be used as future evaluation standards can significantly enhance the workers ’ performance. Results have important implication for methods that improve the reliability of human computation systems