Search CORE

3,996 research outputs found

An introduction to crowdsourcing for language and multimedia technology research

Author: A. Doan
C. Callison-Burch
C. Rashtchian
G. Paolacci
G. Pickard
J. Ross
L. Ahn von
L. Ahn von
M. Larson
O. Alonso
R. Snow
S. Novotney
T. Yan
V.C. Rayker
V.S. Sheng
W. Mason
W. Willett
W.S. Lasecki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible

Crossref

Irish Universities

DCU Online Research Access Service

Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use

Author: Adda Gilles
Couillault Alain
Fort Karen
Mariani Joseph
Sagot Benoît
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/07/2014
Field of study

International audienceThis article is a position paper about Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years. According to the mainstream opinion expressed in articles of the domain, this type of on-line working platforms allows to develop quickly all sorts of quality language resources, at a very low price, by people doing that as a hobby. We shall demonstrate here that the situation is far from being that ideal. Our goal here is manifold: 1- to inform researchers, so that they can make their own choices, 2- to develop alternatives with the help of funding agencies and scientific associations, 3- to propose practical and organizational solutions in order to improve language resources development, while limiting the risks of ethical and legal issues without letting go price or quality, 4- to introduce an Ethics and Big Data Charter for the documentation of language resourc

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

A Template Based Approach for Training NMT for Low-Resource Uralic Languages - A Pilot with Finnish

Author: Evert Stefan
Mayer Thomas
Munteanu Dragos Stefan
Smedt Tom De
Publication venue: ACM
Publication date: 01/12/2019
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto