Search CORE

7 research outputs found

Engineering Crowdsourced Stream Processing Systems

Author: Carlos Castillo
Crp Henri Tudor
Ioanna Lykourentzou
Muhammad Imran
Yannick Naudet
Publication venue
Publication date: 04/08/2014
Field of study

A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

arXiv.org e-Print Archive

CiteSeerX

BUOCA: Budget-Optimized Crowd Worker Allocation

Author: Betke Margrit
Guo Lei
Ishwar Prakash
Lai Sha
Mays Kate K.
Sameki Mehrnoosh
Publication venue
Publication date: 11/01/2019
Field of study

Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the allocations of the number of workers to a task can be computed optimally based on task features alone, without using worker profiles. Our target tasks are delineating cells in microscopy images and analyzing the sentiment toward the 2016 U.S. presidential candidates in tweets. We first propose an algorithm that computes budget-optimized crowd worker allocation (BUOCA). We next train a machine learning system (BUOCA-ML) that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling. We show that the computed allocation can yield large savings in the crowdsourcing budget (up to 49 percent points) while maintaining labeling accuracy. Finally, we envisage a human-machine system for performing budget-optimized data analysis at a scale beyond the feasibility of crowdsourcing.First author draf

Boston University Institutional Repository (OpenBU)

BUOCA: Budget-Optimized Crowd Worker Allocation

Author: Betke Margrit
Guo Lei
Ishwar Prakash
Lai Sha
Mays Kate K.
Sameki Mehrnoosh
Publication venue
Publication date: 11/01/2019
Field of study

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

On optimality of jury selection in crowdsourcing

Author: Cheng R
Maniu S
Mo L
Zheng Y
Publication venue: OpenProceedings.org.
Publication date: 01/01/2015
Field of study

Recent advances in crowdsourcing technologies enable computationally challenging tasks (e.g., sentiment analysis and entity resolution) to be performed by Internet workers, driven mainly by monetary incentives. A fundamental question is: how should workers be selected, so that the tasks in hand can be accomplished successfully and economically? In this paper, we study the Jury Selection Problem (JSP): Given a monetary budget, and a set of decision-making tasks (e.g., “Is Bill Gates still the CEO of Microsoft now?”), return the set of workers (called jury), such that their answers yield the highest “Jury Quality” (or JQ). Existing JSP solutions make use of the Majority Voting (MV) strategy, which uses the answer chosen by the largest number of workers. We show that MV does not yield the best solution for JSP. We further prove that among all voting strategies (including deterministic and randomized strategies), Bayesian Voting (BV) can optimally solve JSP. We then examine how to solve JSP based on BV. This is technically challenging, since computing the JQ with BV is NP-hard. We solve this problem by proposing an approximate algorithm that is computationally efficient. Our approximate JQ computation algorithm is also highly accurate, and its error is proved to be bounded within 1%. We extend our solution by considering the task owner’s “belief” (or prior) on the answers of the tasks. Experiments on synthetic and real datasets show that our new approach is consistently better than the best JSP solution known.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Answering skyline queries over incomplete data with crowdsourcing (Extended Abstract)

Author: Chen Lu
Gao Yunjun
Guo Su
Li Qing
Miao Xiaoye
Yin Jianwei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2020
Field of study

VBN

An online cost sensitive decision-making method in crowdsourcing systems

Author: Chen G.
Gao J.
Liu X.
Ooi B.C.
Wang H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

10.1145/2463676.2465307Proceedings of the ACM SIGMOD International Conference on Management of Data217-22

Crossref

ScholarBank@NUS

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen