Search CORE

27 research outputs found

Comparing Bayesian Models of Annotation

Author: Carpenter Bob
Chamberlain JD
Hovy Dirk
Kruschwitz Udo
Paun Silviu
Poesio Massimo
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2018
Field of study

The analysis of crowdsourced annotations in NLP is concerned with identifying 1) gold standard labels, 2) annotator accuracies and biases, and 3) item difficulties and error patterns. Traditionally, majority voting was used for 1), and coefficients of agreement for 2) and 3). Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation

University of Essex Research Repository

University of Regensburg Publication Server

Archivio istituzionale della Ricerca - Bocconi

Crossref

Queen Mary Research Online

Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning

Author: Fornaciari Tommaso
Hovy Dirk
Paun Silviu
Plank Barbara
Poesio Massimo
Uma Alexandra
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Supervised learning assumes that a ground truth label exists. However, the reliability of this ground truth depends on human annotators, who often disagree. Prior work has shown that this disagreement can be helpful in training models. We propose a novel method to incorporate this disagreement as information: in addition to the standard error computation, we use soft labels (i.e., probability distributions over the annotator labels) as an auxiliary task in a multi-task neural network. We measure the divergence between the predictions and the target soft labels with several loss-functions and evaluate the models on various NLP tasks. We find that the soft-label prediction auxiliary task reduces the penalty for errors on ambiguous entities and thereby mitigates overfitting. It significantly improves performance across tasks beyond the standard approach and prior work

Archivio istituzionale della Ricerca - Bocconi

The IT University of Copenhagen's Repository

Queen Mary Research Online

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

Author: Bar-Haim Roy
Cattan Arie
Dagan Ido
Downey Doug
Eden Lilach
Hope Tom
Kantor Yoav
Publication venue
Publication date: 19/11/2023
Field of study

Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.Comment: EMNLP 202

arXiv.org e-Print Archive

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Author: Hovy Dirk
Pierrehumbert Janet B.
Röttger Paul
Vidgen Bertie
Publication venue
Publication date: 01/01/2022
Field of study

Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but rarely actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downstream use. To address this issue, we propose two contrasting paradigms for data annotation. The descriptive paradigm encourages annotator subjectivity, whereas the prescriptive paradigm discourages it. Descriptive annotation allows for the surveying and modelling of different beliefs, whereas prescriptive annotation enables the training of models that consistently apply one belief. We discuss benefits and challenges in implementing both paradigms, and argue that dataset creators should explicitly aim for one or the other to facilitate the intended use of their dataset. Lastly, we conduct an annotation experiment using hate speech data that illustrates the contrast between the two paradigms.Comment: Accepted at NAACL 2022 (Main Conference

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Oxford University Research Archive

Two contrasting data annotation paradigms for subjective NLP tasks

Author: Hovy Dirk
Pierrehumbert Janet B
Röttger Paul
Vidgen Bertie
Publication venue: Association for Computational Linguistics
Publication date: 26/07/2022
Field of study

Oxford University Research Archive

Two contrasting data annotation paradigms for subjective NLP tasks

Author: Hovy Dirk
Pierrehumbert Janet
Rottger Paul
Vidgen Bertie
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della Ricerca - Bocconi

A Bayesian Approach for Sequence Tagging with Crowds

Author: Gurevych Iryna
Simpson Edwin
Publication venue
Publication date: 01/01/2019
Field of study

Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.Comment: Accepted for EMNLP 201

arXiv.org e-Print Archive

TUbiblio

Crossref

Explore Bristol Research

HuCurl: Human-induced Curriculum Discovery

Author: Amiri Hadi
Elgaar Mohamed
Publication venue
Publication date: 14/07/2023
Field of study

We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks.Comment: In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL

arXiv.org e-Print Archive