Search CORE

2 research outputs found

Efficiently identifying a well-performing crowd process for a given problem

Author: Bernstein Abraham
De Boer Patrick M
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2017
Field of study

With the increasing popularity of crowdsourcing and crowd computing, the question of how to select a well-performing crowd process for a problem at hand is growing ever more important. Prior work casted crowd process selection to an optimization problem, whose solution is the crowd process performing best for a user’s problem. However, existing approaches require users to probabilistically model aspects of the problem, which may entail a substantial investment of time and may be error-prone. We propose to use black- box optimization instead, a family of techniques that do not require probabilistic modelling by the end user. Specifically, we adopt Bayesian Optimization to approximate the maximum of a utility function quantifying the user’s (business-) objectives while minimizing search cost. Our approach is validated in a simulation and three real-world experiments. The black-box nature of our approach may enable us to reduce the entry barrier for efficiently building crowdsourcing solutions

ZORA

Recommended from our members

Enhancing worker management and supporting external tasks in crowdsourced data labeling

Author: Thapa Sukanya
Publication venue
Publication date: 19/04/2024
Field of study

Human data labeling is key to training supervised machine learning (ML) models. We propose a new software infrastructure layer to augment capabilities of Amazon’s SageMaker Ground Truth (GT) data labeling platform. Whereas crowdsourced annotation via Amazon Mechanical Turk (MTurk) is well-established, Amazon’s more recent GT platform is less known but specifically designed to support ML annotation. Differentiating features include a curated “public crowd” sourced from MTurk, and integrating human labeling into Amazon’s broader SageMaker ML tool suite, which provides an end-to-end pipeline for training and deploying ML services. Key features of our software layer include: 1) continuous worker performance monitoring wrt. Requester gold labels; 2) automatically restricting task access when performance standards are not met; 3) geographic-based restriction of task access to US-based workers; and 4) the ability to conduct external tasks off-platform while sourcing workers from GT and continuing to use GT’s payment system. Our design seeks to streamline Requester experience with minimal changes, and to utilize a sustainable software design to ease long-term management, extension, and maintenance. More generally, design goals center on promoting efficient, user-friendly, and quality-focused data labeling with crowdsourced annotators.Informatio

Texas ScholarWorks