252 research outputs found
Capturing Ambiguity in Crowdsourcing Frame Disambiguation
FrameNet is a computational linguistics resource composed of semantic frames,
high-level concepts that represent the meanings of words. In this paper, we
present an approach to gather frame disambiguation annotations in sentences
using a crowdsourcing approach with multiple workers per sentence to capture
inter-annotator disagreement. We perform an experiment over a set of 433
sentences annotated with frames from the FrameNet corpus, and show that the
aggregated crowd annotations achieve an F1 score greater than 0.67 as compared
to expert linguists. We highlight cases where the crowd annotation was correct
even though the expert is in disagreement, arguing for the need to have
multiple annotators per sentence. Most importantly, we examine cases in which
crowd workers could not agree, and demonstrate that these cases exhibit
ambiguity, either in the sentence, frame, or the task itself, and argue that
collapsing such cases to a single, discrete truth value (i.e. correct or
incorrect) is inappropriate, creating arbitrary targets for machine learning.Comment: in publication at the sixth AAAI Conference on Human Computation and
Crowdsourcing (HCOMP) 201
Recommended from our members
Television and the future internet: the NoTube project
[1st paragraph] 'New technology is transforming the TV industry', Mark Thompson, BBC Director General told the newspaper The Observer. The classic notion of TV being a set in the living room with finite channels and linear programming is already gone: TV has moved into the world of Internet and mobile technology and content is growing exponentially in terms of number and diversity. The notion of channels is being replaced by individual choice and on-demand programming. Distinctions between TV and other streaming content are blurred: both live in a shared connected online world. We expect that as the Future Internet develops, TV will complete this disruptive paradigm shift into becoming ubiquituous, always-available, and increasingly personalized. NoTube is a EU funded project (in the Objective 4.3 Intelligent Information Management) which began February 2009 and runs for three years, with the goal to prepare TV for the Future Internet – addressing challenges of TV content ubiquity and choice, personalization and integration
A Crowdsourced Frame Disambiguation Corpus with Ambiguity
We present a resource for the task of FrameNet semantic frame disambiguation
of over 5,000 word-sentence pairs from the Wikipedia corpus. The annotations
were collected using a novel crowdsourcing approach with multiple workers per
sentence to capture inter-annotator disagreement. In contrast to the typical
approach of attributing the best single frame to each word, we provide a list
of frames with disagreement-based scores that express the confidence with which
each frame applies to the word. This is based on the idea that inter-annotator
disagreement is at least partly caused by ambiguity that is inherent to the
text and frames. We have found many examples where the semantics of individual
frames overlap sufficiently to make them acceptable alternatives for
interpreting a sentence. We have argued that ignoring this ambiguity creates an
overly arbitrary target for training and evaluating natural language processing
systems - if humans cannot agree, why would we expect the correct answer from a
machine to be any different? To process this data we also utilized an expanded
lemma-set provided by the Framester system, which merges FN with WordNet to
enhance coverage. Our dataset includes annotations of 1,000 sentence-word pairs
whose lemmas are not part of FN. Finally we present metrics for evaluating
frame disambiguation systems that account for ambiguity.Comment: Accepted to NAACL-HLT201
Crowdsourcing Semantic Label Propagation in Relation Classification
Distant supervision is a popular method for performing relation extraction
from text that is known to produce noisy labels. Most progress in relation
extraction and classification has been made with crowdsourced corrections to
distant-supervised labels, and there is evidence that indicates still more
would be better. In this paper, we explore the problem of propagating human
annotation signals gathered for open-domain relation classification through the
CrowdTruth methodology for crowdsourcing, that captures ambiguity in
annotations by measuring inter-annotator disagreement. Our approach propagates
annotations to sentences that are similar in a low dimensional embedding space,
expanding the number of labels by two orders of magnitude. Our experiments show
significant improvement in a sentence-level multi-class relation classifier.Comment: In publication at the First Workshop on Fact Extraction and
Verification (FeVer) at EMNLP 201
Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study
Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%
Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation
Big data is having a disruptive impact across the sciences. Human annotation of semantic interpretation tasks is a critical part of big data semantics, but it is based on an antiquated ideal of a single correct truth that needs to be similarly disrupted. We expose seven myths about human annotation, most of which derive from that antiquated ideal of truth, and dispell these myths with examples from our research. We propose a new theory of truth, crowd truth, that is based on the intuition that human interpretation is subjective, and that measuring annotations on the same objects of interpretation (in our examples, sentences) across a crowd will provide a useful representation of their subjectivity and the range of reasonable interpretations
Accurator: Nichesourcing for Cultural Heritage
With more and more cultural heritage data being published online, their
usefulness in this open context depends on the quality and diversity of
descriptive metadata for collection objects. In many cases, existing metadata
is not adequate for a variety of retrieval and research tasks and more specific
annotations are necessary. However, eliciting such annotations is a challenge
since it often requires domain-specific knowledge. Where crowdsourcing can be
successfully used for eliciting simple annotations, identifying people with the
required expertise might prove troublesome for tasks requiring more complex or
domain-specific knowledge. Nichesourcing addresses this problem, by tapping
into the expert knowledge available in niche communities. This paper presents
Accurator, a methodology for conducting nichesourcing campaigns for cultural
heritage institutions, by addressing communities, organizing events and
tailoring a web-based annotation tool to a domain of choice. The contribution
of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation
tool for experts and 3) validation of the methodology and tool in three case
studies. The three domains of the case studies are birds on art, bible prints
and fashion images. We compare the quality and quantity of obtained annotations
in the three case studies, showing that the nichesourcing methodology in
combination with the image annotation tool can be used to collect high quality
annotations in a variety of domains and annotation tasks. A user evaluation
indicates the tool is suited and usable for domain specific annotation tasks
CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement
Typically crowdsourcing-based approaches to gather annotated data use
inter-annotator agreement as a measure of quality. However, in many domains,
there is ambiguity in the data, as well as a multitude of perspectives of the
information examples. In this paper, we present ongoing work into the
CrowdTruth metrics, that capture and interpret inter-annotator disagreement in
crowdsourcing. The CrowdTruth metrics model the inter-dependency between the
three main components of a crowdsourcing system -- worker, input data, and
annotation. The goal of the metrics is to capture the degree of ambiguity in
each of these three components. The metrics are available online at
https://github.com/CrowdTruth/CrowdTruth-core
- …