35 research outputs found
Repeated Labeling Using Multiple Noisy Labelers
This paper addresses the repeated acquisition of labels for data items
when the labeling is imperfect. We examine the improvement (or lack
thereof) in data quality via repeated labeling, and focus especially on
the improvement of training labels for supervised induction. With the
outsourcing of small tasks becoming easier, for example via Amazon's
Mechanical Turk, it often is possible to obtain less-than-expert
labeling at low cost. With low-cost labeling, preparing the unlabeled
part of the data can become considerably more expensive than labeling.
We present repeated-labeling strategies of increasing complexity, and
show several main results. (i) Repeated-labeling can improve label
quality and model quality, but not always. (ii) When labels are noisy,
repeated labeling can be preferable to single labeling even in the
traditional setting where labels are not particularly cheap. (iii) As
soon as the cost of processing the unlabeled data is not free, even the
simple strategy of labeling everything multiple times can give
considerable advantage. (iv) Repeatedly labeling a carefully chosen set
of points is generally preferable, and we present a set of robust
techniques that combine different notions of uncertainty to select data
points for which quality should be improved. The bottom line: the
results show clearly that when labeling is not perfect, selective
acquisition of multiple labels is a strategy that data miners should
have in their repertoire. For certain label-quality/cost regimes, the
benefit is substantial.This work was supported by the National Science Foundation under Grant
No. IIS-0643846, by an NSERC Postdoctoral Fellowship, and by an NEC
Faculty Fellowship
Supervised Collective Classification for Crowdsourcing
Crowdsourcing utilizes the wisdom of crowds for collective classification via
information (e.g., labels of an item) provided by labelers. Current
crowdsourcing algorithms are mainly unsupervised methods that are unaware of
the quality of crowdsourced data. In this paper, we propose a supervised
collective classification algorithm that aims to identify reliable labelers
from the training data (e.g., items with known labels). The reliability (i.e.,
weighting factor) of each labeler is determined via a saddle point algorithm.
The results on several crowdsourced data show that supervised methods can
achieve better classification accuracy than unsupervised methods, and our
proposed method outperforms other algorithms.Comment: to appear in IEEE Global Communications Conference (GLOBECOM)
Workshop on Networking and Collaboration Issues for the Internet of
Everythin
Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data
Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences
Learning From Noisy Singly-labeled Data
Supervised learning depends on annotated examples, which are taken to be the
\emph{ground truth}. But these labels often come from noisy crowdsourcing
platforms, like Amazon Mechanical Turk. Practitioners typically collect
multiple labels per example and aggregate the results to mitigate noise (the
classic crowdsourcing problem). Given a fixed annotation budget and unlimited
unlabeled data, redundant annotation comes at the expense of fewer labeled
examples. This raises two fundamental questions: (1) How can we best learn from
noisy workers? (2) How should we allocate our labeling budget to maximize the
performance of a classifier? We propose a new algorithm for jointly modeling
labels and worker quality from noisy crowd-sourced data. The alternating
minimization proceeds in rounds, estimating worker quality from disagreement
with the current model and then updating the model by optimizing a loss
function that accounts for the current estimate of worker quality. Unlike
previous approaches, even with only one annotation per example, our algorithm
can estimate worker quality. We establish a generalization error bound for
models learned with our algorithm and establish theoretically that it's better
to label many examples once (vs less multiply) when worker quality is above a
threshold. Experiments conducted on both ImageNet (with simulated noisy
workers) and MS-COCO (using the real crowdsourced labels) confirm our
algorithm's benefits.Comment: 18 pages, 3 figure
An introduction to crowdsourcing for language and multimedia technology research
Language and multimedia technology research often relies on
large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible