6,289 research outputs found
Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa
This paper presents a generic Bayesian framework that enables any deep
learning model to actively learn from targeted crowds. Our framework inherits
from recent advances in Bayesian deep learning, and extends existing work by
considering the targeted crowdsourcing approach, where multiple annotators with
unknown expertise contribute an uncontrolled amount (often limited) of
annotations. Our framework leverages the low-rank structure in annotations to
learn individual annotator expertise, which then helps to infer the true labels
from noisy and sparse annotations. It provides a unified Bayesian model to
simultaneously infer the true labels and train the deep learning model in order
to reach an optimal learning efficacy. Finally, our framework exploits the
uncertainty of the deep learning model during prediction as well as the
annotators' estimated expertise to minimize the number of required annotations
and annotators for optimally training the deep learning model.
We evaluate the effectiveness of our framework for intent classification in
Alexa (Amazon's personal assistant), using both synthetic and real-world
datasets. Experiments show that our framework can accurately learn annotator
expertise, infer true labels, and effectively reduce the amount of annotations
in model training as compared to state-of-the-art approaches. We further
discuss the potential of our proposed framework in bridging machine learning
and crowdsourcing towards improved human-in-the-loop systems
Modeling with the Crowd: Optimizing the Human-Machine Partnership with Zooniverse
LSST and Euclid must address the daunting challenge of analyzing the
unprecedented volumes of imaging and spectroscopic data that these
next-generation instruments will generate. A promising approach to overcoming
this challenge involves rapid, automatic image processing using appropriately
trained Deep Learning (DL) algorithms. However, reliable application of DL
requires large, accurately labeled samples of training data. Galaxy Zoo Express
(GZX) is a recent experiment that simulated using Bayesian inference to
dynamically aggregate binary responses provided by citizen scientists via the
Zooniverse crowd-sourcing platform in real time. The GZX approach enables
collaboration between human and machine classifiers and provides rapidly
generated, reliably labeled datasets, thereby enabling online training of
accurate machine classifiers. We present selected results from GZX and show how
the Bayesian aggregation engine it uses can be extended to efficiently provide
object-localization and bounding-box annotations of two-dimensional data with
quantified reliability. DL algorithms that are trained using these annotations
will facilitate numerous panchromatic data modeling tasks including
morphological classification and substructure detection in direct imaging, as
well as decontamination and emission line identification for slitless
spectroscopy. Effectively combining the speed of modern computational analyses
with the human capacity to extrapolate from few examples will be critical if
the potential of forthcoming large-scale surveys is to be realized.Comment: 5 pages, 1 figure. To appear in Proceedings of the International
Astronomical Unio
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
Capturing Ambiguity in Crowdsourcing Frame Disambiguation
FrameNet is a computational linguistics resource composed of semantic frames,
high-level concepts that represent the meanings of words. In this paper, we
present an approach to gather frame disambiguation annotations in sentences
using a crowdsourcing approach with multiple workers per sentence to capture
inter-annotator disagreement. We perform an experiment over a set of 433
sentences annotated with frames from the FrameNet corpus, and show that the
aggregated crowd annotations achieve an F1 score greater than 0.67 as compared
to expert linguists. We highlight cases where the crowd annotation was correct
even though the expert is in disagreement, arguing for the need to have
multiple annotators per sentence. Most importantly, we examine cases in which
crowd workers could not agree, and demonstrate that these cases exhibit
ambiguity, either in the sentence, frame, or the task itself, and argue that
collapsing such cases to a single, discrete truth value (i.e. correct or
incorrect) is inappropriate, creating arbitrary targets for machine learning.Comment: in publication at the sixth AAAI Conference on Human Computation and
Crowdsourcing (HCOMP) 201
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
Detecting animals in African Savanna with UAVs and the crowds
Unmanned aerial vehicles (UAVs) offer new opportunities for wildlife
monitoring, with several advantages over traditional field-based methods. They
have readily been used to count birds, marine mammals and large herbivores in
different environments, tasks which are routinely performed through manual
counting in large collections of images. In this paper, we propose a
semi-automatic system able to detect large mammals in semi-arid Savanna. It
relies on an animal-detection system based on machine learning, trained with
crowd-sourced annotations provided by volunteers who manually interpreted
sub-decimeter resolution color images. The system achieves a high recall rate
and a human operator can then eliminate false detections with limited effort.
Our system provides good perspectives for the development of data-driven
management practices in wildlife conservation. It shows that the detection of
large mammals in semi-arid Savanna can be approached by processing data
provided by standard RGB cameras mounted on affordable fixed wings UAVs
- …