64 research outputs found

    Collaboratively crowdsourcing workflows with turkomatic

    Full text link

    Crowdsourcing Processes: A Survey of Approaches and Opportunities

    Get PDF
    This article makes a case for crowdsourcing approaches that are able to manage crowdsourcing processes - that is, crowdsourcing scenarios that go beyond the mere outsourcing of multiple instances of a micro-task and instead require the coordination of multiple different crowd and machine tasks. It introduces the necessary background and terminology, identifies a set of analysis dimensions, and surveys state-of-the-art tools, highlighting strong and weak aspects and promising future research and development directions

    Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation

    Full text link
    A human computation system can be viewed as a distributed system in which the processors are humans, called workers. Such systems harness the cognitive power of a group of workers connected to the Internet to execute relatively simple tasks, whose solutions, once grouped, solve a problem that systems equipped with only machines could not solve satisfactorily. Examples of such systems are Amazon Mechanical Turk and the Zooniverse platform. A human computation application comprises a group of tasks, each of them can be performed by one worker. Tasks might have dependencies among each other. In this study, we propose a theoretical framework to analyze such type of application from a distributed systems point of view. Our framework is established on three dimensions that represent different perspectives in which human computation applications can be approached: quality-of-service requirements, design and management strategies, and human aspects. By using this framework, we review human computation in the perspective of programmers seeking to improve the design of human computation applications and managers seeking to increase the effectiveness of human computation infrastructures in running such applications. In doing so, besides integrating and organizing what has been done in this direction, we also put into perspective the fact that the human aspects of the workers in such systems introduce new challenges in terms of, for example, task assignment, dependency management, and fault prevention and tolerance. We discuss how they are related to distributed systems and other areas of knowledge.Comment: 3 figures, 1 tabl

    BPMN task instance streaming for efficient micro-task crowdsourcing processes

    Get PDF
    The Business Process Model and Notation (BPMN) is a standard for modeling and executing business processes with human or machine tasks. The semantics of tasks is usually discrete: a task has exactly one start event and one end event; for multi-instance tasks, all instances must complete before an end event is emitted. We propose a new task type and streaming connector for crowdsourcing able to run hundreds or thousands of micro-task instances in parallel. The two constructs provide for task streaming semantics that is new to BPMN, enable the modeling and efficient enactment of complex crowdsourcing scenarios, and are applicable also beyond the special case of crowdsourcing. We implement the necessary design and runtime support on top of Crowd- Flower, demonstrate the viability of the approach via a case study, and report on a set of runtime performance experiments

    Cost and Quality Assurance in Crowdsourcing Workflows (Extended Abstract)

    Get PDF
    International audienceDespite recent advances in artificial intelligence and machine learning, many tasks still require human contributions. With the growing availability of Internet, it is now possible to hire workers on crowdsourcing marketplaces. Many crowdsourcing platforms have emerged in the last decade: Amazon Mechanical Turk, Figure Eight 2 , Wirk 3 , etc. A platform allows employers to post tasks, that are then realized by workers hired from the crowd in exchange for some incentives [3, 19]. Common tasks include image annotation, surveys, classification, recommendation, sentiment analysis, etc. [7]. The existing platforms support simple, repetitive and independent micro-tasks which require a few minutes to an hour to complete. However, many real-world problems are not simple micro-tasks, but rather complex orchestrations of dependent tasks, that process input data and collect human expertize. Existing platforms provide interfaces to post micro-tasks to a crowd, but cannot handle complex tasks. The next stage of crowdsourcing is to build systems to specify and execute complex tasks over existing crowd platforms. A natural solution is to use workflows, i.e., orchestrations of phases that exchange data to achieve a final objective. Figure 1 is an example of complex workflow depicting the image annotation process on SPIPOLL [5], a platform to survey populations of pollinating insects. Contributors take pictures of insects that are then classified by crowdworkers. Pictures are grouped in a dataset , input to node 0. is filtered to eliminate bad pictures (fuzzy, blurred,...) in phase 0. The remaining pictures are sent to workers who try to classify them. If classification is too difficult, the image is sent to an expert. Initial classification is represented by phase 1 in the workflow, and expert classification by 2. Pictures that were discarded, classified easily or studied by experts are then assembled in a result dataset in phase , to do statistics on insect populations. Workflows alone are not sufficient to crowdsource complex tasks. Many data-centric applications come with budget and quality constraints: As human workers are prone to errors, one has to hire several workers to aggregate a final answer with sufficient confidence. An unlimited budget allows hiring large pools of workers to assemble reliable answers for each micro-task, but in general, a client for a complex task has a limited budget. This forces to replicate micro-tasks in an optimal way to achieve the best possible quality, but without exhausting the given budget. The objective is hence to obtain a reliable result, forged through a complex orchestration, at a reasonable cost. Several works consider data centric models, deployment on crowdsourcing platforms, and aggregation techniques to improve data quality (see [11] for a more complete bibliography). First, coordination of tasks has been considered in languages such as BPM
    • …
    corecore