810 research outputs found
Supporting Answerers with Feedback in Social Q&A
Prior research has examined the use of Social Question and Answer (Q&A)
websites for answer and help seeking. However, the potential for these websites
to support domain learning has not yet been realized. Helping users write
effective answers can be beneficial for subject area learning for both
answerers and the recipients of answers. In this study, we examine the utility
of crowdsourced, criteria-based feedback for answerers on a student-centered
Q&A website, Brainly.com. In an experiment with 55 users, we compared
perceptions of the current rating system against two feedback designs with
explicit criteria (Appropriate, Understandable, and Generalizable). Contrary to
our hypotheses, answerers disagreed with and rejected the criteria-based
feedback. Although the criteria aligned with answerers' goals, and crowdsourced
ratings were found to be objectively accurate, the norms and expectations for
answers on Brainly conflicted with our design. We conclude with implications
for the design of feedback in social Q&A.Comment: Published in Proceedings of the Fifth Annual ACM Conference on
Learning at Scale, Article No. 10, London, United Kingdom. June 26 - 28, 201
Crowdsourcing subjective annotations using pairwise comparisons reduces bias and error compared to the majority-vote method
How to better reduce measurement variability and bias introduced by
subjectivity in crowdsourced labelling remains an open question. We introduce a
theoretical framework for understanding how random error and measurement bias
enter into crowdsourced annotations of subjective constructs. We then propose a
pipeline that combines pairwise comparison labelling with Elo scoring, and
demonstrate that it outperforms the ubiquitous majority-voting method in
reducing both types of measurement error. To assess the performance of the
labelling approaches, we constructed an agent-based model of crowdsourced
labelling that lets us introduce different types of subjectivity into the
tasks. We find that under most conditions with task subjectivity, the
comparison approach produced higher scores. Further, the comparison
approach is less susceptible to inflating bias, which majority voting tends to
do. To facilitate applications, we show with simulated and real-world data that
the number of required random comparisons for the same classification accuracy
scales log-linearly with the number of labelled items. We also
implemented the Elo system as an open-source Python package.Comment: Accepted for publication at ACM CSCW 202
Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.
Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker classification is presented in this dissertation. In particular, the biased workers are classified as a subcategory of sloppy workers. Finally, an ITerative Self Correcting - Truth Discovery (ITSC-TD) framework is then proposed, which can reliably recognize biased workers in ordinal labeling tasks, based on a probabilistic based bias detection model. ITSC-TD estimates true labels through applying an optimization based truth discovery method, which minimizes overall label errors by assigning different weights to workers. The typical tasks posted on popular crowdsourcing platforms, such as MTurk, are simple tasks, which are low in complexity, independent, and require little time to complete. Complex tasks, however, in many cases require the crowd workers to possess specialized skills in task domains. As a result, this type of task is more inclined to have the problem of poor quality of feedback from crowds, compared to simple tasks. As such, we propose a multiple views approach, for the purpose of obtaining high quality consensus labels in complex labeling tasks. In this approach, each view is defined as a labeling critique or rubric, which aims to guide the workers to become aware of the desirable work characteristics or goals. Combining the view labels results in the overall estimated labels for each item. The multiple views approach is developed under the hypothesis that workers\u27 performance might differ from one view to another. Varied weights are then assigned to different views for each worker. Additionally, the ITSC-TD framework is integrated into the multiple views model to achieve high quality estimated truths for each view. Next, we propose a Semi-supervised Worker Filtering (SWF) model to eliminate spam workers, who assign random labels for each item. The SWF approach conducts worker filtering with a limited set of gold truths available as priori. Each worker is associated with a spammer score, which is estimated via the developed semi-supervised model, and low quality workers are efficiently detected by comparing the spammer score with a predefined threshold value. The efficiency of all the developed frameworks and models are demonstrated on simulated and real-world data sets. By comparing the proposed frameworks to a set of state-of-art methodologies, such as expectation maximization based aggregating algorithm, GLAD and optimization based truth discovery approach, in the domain of crowdsourcing, up to 28.0% improvement can be obtained for the accuracy of true label estimation
Iterative Bayesian Learning for Crowdsourced Regression
Crowdsourcing platforms emerged as popular venues for purchasing human
intelligence at low cost for large volume of tasks. As many low-paid workers
are prone to give noisy answers, a common practice is to add redundancy by
assigning multiple workers to each task and then simply average out these
answers. However, to fully harness the wisdom of the crowd, one needs to learn
the heterogeneous quality of each worker. We resolve this fundamental challenge
in crowdsourced regression tasks, i.e., the answer takes continuous labels,
where identifying good or bad workers becomes much more non-trivial compared to
a classification setting of discrete labels. In particular, we introduce a
Bayesian iterative scheme and show that it provably achieves the optimal mean
squared error. Our evaluations on synthetic and real-world datasets support our
theoretical results and show the superiority of the proposed scheme
Optimal Inference in Crowdsourced Classification via Belief Propagation
Crowdsourcing systems are popular for solving large-scale labelling tasks
with low-paid workers. We study the problem of recovering the true labels from
the possibly erroneous crowdsourced labels under the popular Dawid-Skene model.
To address this inference problem, several algorithms have recently been
proposed, but the best known guarantee is still significantly larger than the
fundamental limit. We close this gap by introducing a tighter lower bound on
the fundamental limit and proving that Belief Propagation (BP) exactly matches
this lower bound. The guaranteed optimality of BP is the strongest in the sense
that it is information-theoretically impossible for any other algorithm to
correctly label a larger fraction of the tasks. Experimental results suggest
that BP is close to optimal for all regimes considered and improves upon
competing state-of-the-art algorithms.Comment: This article is partially based on preliminary results published in
the proceeding of the 33rd International Conference on Machine Learning (ICML
2016
Globally Optimal Crowdsourcing Quality Management
We study crowdsourcing quality management, that is, given worker responses to
a set of tasks, our goal is to jointly estimate the true answers for the tasks,
as well as the quality of the workers. Prior work on this problem relies
primarily on applying Expectation-Maximization (EM) on the underlying maximum
likelihood problem to estimate true answers as well as worker quality.
Unfortunately, EM only provides a locally optimal solution rather than a
globally optimal one. Other solutions to the problem (that do not leverage EM)
fail to provide global optimality guarantees as well. In this paper, we focus
on filtering, where tasks require the evaluation of a yes/no predicate, and
rating, where tasks elicit integer scores from a finite domain. We design
algorithms for finding the global optimal estimates of correct task answers and
worker quality for the underlying maximum likelihood problem, and characterize
the complexity of these algorithms. Our algorithms conceptually consider all
mappings from tasks to true answers (typically a very large number), leveraging
two key ideas to reduce, by several orders of magnitude, the number of mappings
under consideration, while preserving optimality. We also demonstrate that
these algorithms often find more accurate estimates than EM-based algorithms.
This paper makes an important contribution towards understanding the inherent
complexity of globally optimal crowdsourcing quality management
- …