Search CORE

37 research outputs found

Reliability measurement without limits

Author: Carletta J.
Reidsma D.
Publication venue: MIT Press
Publication date: 01/01/2008
Field of study

In computational linguistics, a reliability measurement of 0.8 on some statistic such as

\kappa

is widely thought to guarantee that hand-coded data is fit for purpose, with lower values suspect. We demonstrate that the main use of such data, machine learning, can tolerate data with a low reliability as long as any disagreement among human coders looks like random noise. When it does not, however, data can have a reliability of more than 0.8 and still be unsuitable for use: the disagreement may indicate erroneous patterns that machine-learning can learn, and evaluation against test data that contain these same erroneous patterns may lead us to draw wrong conclusions about our machine-learning algorithms. Furthermore, lower reliability values still held as acceptable by many researchers, between 0.67 and 0.8, may even yield inflated performance figures in some circumstances. Although this is a common sense result, it has implications for how we work that are likely to reach beyond the machine-learning applications we discuss. At the very least, computational linguists should look for any patterns in the disagreement among coders and assess what impact they will have

University of Twente Research Information

Subjective Machine Classifiers

Author: op den Akker Rieks
Reidsma Dennis
Publication venue: 'University Library/University of Twente'
Publication date: 01/10/2008
Field of study

University of Twente Research Information

Inter-Coder Agreement for Computational Linguistics

Author: Atkins Sue
Carletta Jean
Carletta Jean
Grosz Barbara J
Hearst Marti A
Krippendorff Klaus
Krippendorff Klaus
Marcus Mitchell P
Massimo Poesio
Passonneau Rebecca J
Poesio Massimo
Reinhart T.
Ron Artstein
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2008
Field of study

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

University of Essex Research Repository

CiteSeerX

Crossref

Comparando anotações linguísticas na Gramateca: filosofia, ferramentas e exemplos

Author: Freitas Cláudia
Marques Rui
Mota Cristina
Santos Diana
Simões Alberto
Publication venue
Publication date: 01/01/2015
Field of study

Neste artigo apresentamos a filosofia geral da Gramateca – um ambiente para fazer uma gramática da língua portuguesa baseada em corpos – e alguns estudos no seu âmbito, nomeadamente o estudo (1) dos conectores condicionais, (2) das palavras referentes ao corpo humano e (3) das emoções na língua. A ênfase é na metodologia, e apresentamos detalhadamente o sistema Rêve para rever e partilhar anotações linguísticas. Ao descrever os vários estudos, indicamos também as metamorfoses e melhorias por que essa ferramenta passou, assim como o tipo de perguntas e de resultados que já conseguimos obter em áreas muito diversas.This paper presents the general philosophy of Gramateca, for corpus-based Portuguese grammar studies, by reporting on three different studies – conditional connectives, body terms, and emotions – emphasizing methodological aspects. It presents in detail the Rêve system, which allows revising and sharing annotations of Rêve’s underlying corpora. While describing the different studies we also report on the improvement of the Rêve tool, and discuss the kinds of questions and results already available for diverse fields

Universidade do Minho: RepositoriUM

Crossref

Repositório Comum

Directory of Open Access Journals

The System of Register Labels in plWordNet

Author: Maciej Piasecki
Marek Maziarz
Stan Szpakowicz
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date: 01/01/2015
Field of study

The System of Register Labels in plWordNet Stylistic registers influence word usage. Both traditional dictionaries and wordnets assign lexical units to registers, and there is a wide range of solutions. A system of register labels can be flat or hierarchical, with few labels or many, homogeneous or decomposed into sets of elementary features. We review the register label systems in lexicography, and then discuss our model, designed for plWordNet, a large wordnet for Polish. There follows a detailed comparative analysis of several register systems in Polish lexical resources. We also present the practical effect of the adoption of our flat, small and homogeneous system: a relatively high consistency of register assignment in plWordNet, as measured by inter-annotator agreement on a manageable sample. Large-scale conclusions for the whole plWordNet remain to be made once the annotation has been completed, but the experience half-way through this labour-intensive exercise is very encouraging

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

Open Challenges in Treebanking: Some Thoughts Based on the Copenhagen Dependency Treebanks

Author: Buch-Kromann Matthias
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 1-13. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

On the Contextual Analysis of Agreement Scores

Author: op den Akker Hendrikus J.A.
Reidsma Dennis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2009
Field of study

University of Twente Research Information

Annotating Social Media Data From Vulnerable Populations: Evaluating Disagreement Between Domain Experts and Graduate Student Annotators

Author: Blandfort Philipp
Frey William
Gaskell Michael
Karaman Svebor
Patton Desmond
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2019
Field of study

Researchers in computer science have spent considerable time developing methods to increase the accuracy and richness of annotations. However, there is a dearth in research that examines the positionality of the annotator, how they are trained and what we can learn from disagreements between different groups of annotators. In this study, we use qualitative analysis, statistical and computational methods to compare annotations between Chicago-based domain experts and graduate students who annotated a total of 1,851 tweets with images that are a part of a larger corpora associated with the Chicago Gang Intervention Study, which aims to develop a computational system that detects aggression and loss among gang-involved youth in Chicago. We found evidence to support the study of disagreement between annotators and underscore the need for domain expertise when reviewing Twitter data from vulnerable populations. Implications for annotation and content moderation are discussed

Crossref

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Predicting worker disagreement for more effective crowd labeling

Author: Gezici Gizem
Rabiger Stefan
Saygin Yucel
Saygın Yücel
Spiliopoulou Myra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2018
Field of study

Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling – a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification

Crossref

Sabanci University Research Database