Search CORE

252 research outputs found

Capturing Ambiguity in Crowdsourcing Frame Disambiguation

Author: Aroyo Lora
Dumitrache Anca
Welty Chris
Publication venue
Publication date: 01/01/2018
Field of study

FrameNet is a computational linguistics resource composed of semantic frames, high-level concepts that represent the meanings of words. In this paper, we present an approach to gather frame disambiguation annotations in sentences using a crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. We perform an experiment over a set of 433 sentences annotated with frames from the FrameNet corpus, and show that the aggregated crowd annotations achieve an F1 score greater than 0.67 as compared to expert linguists. We highlight cases where the crowd annotation was correct even though the expert is in disagreement, arguing for the need to have multiple annotators per sentence. Most importantly, we examine cases in which crowd workers could not agree, and demonstrate that these cases exhibit ambiguity, either in the sentence, frame, or the task itself, and argue that collapsing such cases to a single, discrete truth value (i.e. correct or incorrect) is inappropriate, creating arbitrary targets for machine learning.Comment: in publication at the sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP) 201

arXiv.org e-Print Archive

VU Research Portal

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Television and the future internet: the NoTube project

Author: Aroyo Lora
Dietze Stefan
Nixon Lyndon
Publication venue
Publication date: 01/01/2009
Field of study

[1st paragraph] 'New technology is transforming the TV industry', Mark Thompson, BBC Director General told the newspaper The Observer. The classic notion of TV being a set in the living room with finite channels and linear programming is already gone: TV has moved into the world of Internet and mobile technology and content is growing exponentially in terms of number and diversity. The notion of channels is being replaced by individual choice and on-demand programming. Distinctions between TV and other streaming content are blurred: both live in a shared connected online world. We expect that as the Future Internet develops, TV will complete this disruptive paradigm shift into becoming ubiquituous, always-available, and increasingly personalized. NoTube is a EU funded project (in the Objective 4.3 Intelligent Information Management) which began February 2009 and runs for three years, with the goal to prepare TV for the Future Internet – addressing challenges of TV content ubiquity and choice, personalization and integration

Open Research Online

A Crowdsourced Frame Disambiguation Corpus with Ambiguity

Author: Aroyo Lora
Dumitrache Anca
Welty Chris
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus. The annotations were collected using a novel crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. In contrast to the typical approach of attributing the best single frame to each word, we provide a list of frames with disagreement-based scores that express the confidence with which each frame applies to the word. This is based on the idea that inter-annotator disagreement is at least partly caused by ambiguity that is inherent to the text and frames. We have found many examples where the semantics of individual frames overlap sufficiently to make them acceptable alternatives for interpreting a sentence. We have argued that ignoring this ambiguity creates an overly arbitrary target for training and evaluating natural language processing systems - if humans cannot agree, why would we expect the correct answer from a machine to be any different? To process this data we also utilized an expanded lemma-set provided by the Framester system, which merges FN with WordNet to enhance coverage. Our dataset includes annotations of 1,000 sentence-word pairs whose lemmas are not part of FN. Finally we present metrics for evaluating frame disambiguation systems that account for ambiguity.Comment: Accepted to NAACL-HLT201

arXiv.org e-Print Archive

VU Research Portal

Crossref

Crowdsourcing Semantic Label Propagation in Relation Classification

Author: Aroyo Lora
Dumitrache Anca
Welty Chris
Publication venue
Publication date: 03/09/2018
Field of study

Distant supervision is a popular method for performing relation extraction from text that is known to produce noisy labels. Most progress in relation extraction and classification has been made with crowdsourced corrections to distant-supervised labels, and there is evidence that indicates still more would be better. In this paper, we explore the problem of propagating human annotation signals gathered for open-domain relation classification through the CrowdTruth methodology for crowdsourcing, that captures ambiguity in annotations by measuring inter-annotator disagreement. Our approach propagates annotations to sentences that are similar in a low dimensional embedding space, expanding the number of labels by two orders of magnitude. Our experiments show significant improvement in a sentence-level multi-class relation classifier.Comment: In publication at the First Workshop on Fact Extraction and Verification (FeVer) at EMNLP 201

arXiv.org e-Print Archive

Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

Author: Aroyo Lora
Inel Oana
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%

VU Research Portal

TU Delft Repository

Dagstuhl Research Online Publication Server

Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation

Author: Aroyo Lora
Welty Chris
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2015
Field of study

Big data is having a disruptive impact across the sciences. Human annotation of semantic interpretation tasks is a critical part of big data semantics, but it is based on an antiquated ideal of a single correct truth that needs to be similarly disrupted. We expose seven myths about human annotation, most of which derive from that antiquated ideal of truth, and dispell these myths with examples from our research. We propose a new theory of truth, crowd truth, that is based on the intuition that human interpretation is subjective, and that measuring annotations on the same objects of interpretation (in our examples, sentences) across a crowd will provide a useful representation of their subjectivity and the range of reasonable interpretations

VU Research Portal

Crossref

Association for the Advancement of Artificial Intelligence: AAAI Publications

Accurator: Nichesourcing for Cultural Heritage

Author: Aroyo Lora
De Boer Victor
Dijkshoorn Chris
Schreiber Guus
Publication venue
Publication date: 01/01/2017
Field of study

With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting such annotations is a challenge since it often requires domain-specific knowledge. Where crowdsourcing can be successfully used for eliciting simple annotations, identifying people with the required expertise might prove troublesome for tasks requiring more complex or domain-specific knowledge. Nichesourcing addresses this problem, by tapping into the expert knowledge available in niche communities. This paper presents Accurator, a methodology for conducting nichesourcing campaigns for cultural heritage institutions, by addressing communities, organizing events and tailoring a web-based annotation tool to a domain of choice. The contribution of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation tool for experts and 3) validation of the methodology and tool in three case studies. The three domains of the case studies are birds on art, bible prints and fashion images. We compare the quality and quantity of obtained annotations in the three case studies, showing that the nichesourcing methodology in combination with the image annotation tool can be used to collect high quality annotations in a variety of domains and annotation tasks. A user evaluation indicates the tool is suited and usable for domain specific annotation tasks

arXiv.org e-Print Archive

VU Research Portal

CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement

Author: Aroyo Lora
Dumitrache Anca
Inel Oana
Timmermans Benjamin
Welty Chris
Publication venue
Publication date: 01/01/2018
Field of study

Typically crowdsourcing-based approaches to gather annotated data use inter-annotator agreement as a measure of quality. However, in many domains, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. In this paper, we present ongoing work into the CrowdTruth metrics, that capture and interpret inter-annotator disagreement in crowdsourcing. The CrowdTruth metrics model the inter-dependency between the three main components of a crowdsourcing system -- worker, input data, and annotation. The goal of the metrics is to capture the degree of ambiguity in each of these three components. The metrics are available online at https://github.com/CrowdTruth/CrowdTruth-core

arXiv.org e-Print Archive

VU Research Portal