Search CORE

264 research outputs found

A Crowdsourcing Platform for Italian Linguistic Field Research

Author: Bry François
Kneissl Fabian
Krefeld Thomas
Lücke Stephan
Wieser Christoph
Publication venue
Publication date: 01/01/2013
Field of study

Borsa Parole – A Market for Linguistic Speculation

Author: Bry François
Kneißl Fabian
Publication venue
Publication date: 01/01/2012
Field of study

This article describes a novel approach to linguistic field research consisting in exploiting the self-regulation of a market for collecting data on language use. The market is conceived as an output-agreement game with a purpose called Borsa Parole. The agreement can be traded with by the players what makes it adjustable. Borsa Parole has been conceived and is deployed for a linguistic study on the divergence of Italian dialects and vernaculars

CiteSeerX

Open Access LMU

Creating and Exploiting Annotated Corpora

Author: Vidová Hladká Barbora
Publication venue
Publication date: 20/05/2021
Field of study

CU Digital Repository

Gamifying Language Resource Acquisition

Author: MADGE CHRISTOPHER JAMES
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2020
Field of study

PhD ThesisNatural Language Processing, is an important collection of methods for processing the vast amounts of available natural language text we continually produce. These methods make use of supervised learning, an approach that learns from large amounts of annotated data. As humans, we’re able to provide information about text that such systems can learn from. Historically, this was carried out by small groups of experts. However, this did not scale. This led to various crowdsourcing approaches being taken that used large pools of non-experts. The traditional form of crowdsourcing was to pay users small amounts of money to complete tasks. As time progressed, gamification approaches such as GWAPs, showed various benefits over the micro-payment methods used before. These included a cost saving, worker training opportunities, increased worker engagement and potential to far exceed the scale of crowdsourcing. While these were successful in domains such as image labelling, they struggled in the domain of text annotation, which wasn’t such a natural fit. Despite many challenges, there were also clearly many opportunities and benefits to applying this approach to text annotation. Many of these are demonstrated by Phrase Detectives. Based on lessons learned from Phrase Detectives and investigations into other GWAPs, in this work, we attempt to create full GWAPs for NLP, extracting the benefits of the methodology. This includes training, high quality output from non-experts and a truly game-like GWAP design that players are happy to play voluntarily

Queen Mary Research Online

Inter-Coder Agreement for Computational Linguistics

Author: Atkins Sue
Carletta Jean
Carletta Jean
Grosz Barbara J
Hearst Marti A
Krippendorff Klaus
Krippendorff Klaus
Marcus Mitchell P
Massimo Poesio
Passonneau Rebecca J
Poesio Massimo
Reinhart T.
Ron Artstein
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2008
Field of study

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

University of Essex Research Repository

CiteSeerX

Crossref