Search CORE

75 research outputs found

Translation crowdsourcing: creating a multilingual corpus of online educational content

Author: Castilho Sheila
Egg Markus
Georgakopoulou Panayota
Kermanidis Katia Lida
Kordoni Valia
Naskos Thanasis
Sosoni Vilelmini
Stasimioti Maria
Takoulidou Eirini
van Zaanen Menno
Publication venue: LREC
Publication date: 01/05/2018
Field of study

The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines

Irish Universities

DCU Online Research Access Service

Tilburg University Repository

Enhancing Access to Online Education: Quality Machine Translation of MOOC Content

Author: Bosch Antal van den
Cholakov Kostadin
Hendrickx Iris
Huck Matthias
Kermanidis Katia Lida
Kordoni Valia
Sosoni Vilelmini
Way Andy
Publication venue
Publication date: 01/01/2016
Field of study

Contains fulltext : 162505.pdf (publisher's version ) (Open Access)The International Conference on Language Resources and Evaluation (LREC) 2016, 23 mei 201

Irish Universities

Edinburgh Research Explorer

Radboud Repository

DCU Online Research Access Service

Improving Machine Translation of Educational Content via Crowdsourcing

Author: Behnke Maximiliana
Castilho Sheila
Egg Markus
Gaspari Federico
Georgakopoulou Panayota
Kermanidis Katia Lida
Kordoni Valia
Miceli Barone Antonio Valerio
Naskos Thanasis
Sennrich Rico
Sosoni Vilelmini
Stasimioti Maria
Takoulidou Eirini
van Zaanan Menno
Publication venue
Publication date: 01/01/2018
Field of study

The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora

Archivio della ricerca - Università degli studi di Napoli Federico II

Irish Universities

Edinburgh Research Explorer

DCU Online Research Access Service

Tilburg University Repository

TraMOOC: Translation for Massive Open Online Courses

Author: Birch Lexi
Buliga Ioana
Cholakov Kostadin
Egg Markus
Georgakopoulou Panayota
Gialama Maria
Hendrickx I.H.E.
Jermol Mitja
Kermanidis Katia
Kordoni Valia
Orlic Davor
Papadopoulos Michael
Sosoni Vilelmini
Tsoumakos Dimitrios
van den Bosch Antal
van Zaanen Menno
Way Andy
Publication venue: EAMT
Publication date: 01/05/2015
Field of study

Tilburg University Repository

Situating language register across the ages, languages, modalities, and cultural aspects: Evidence from complementary methods

In the present review paper by members of the collaborative research center “Register: Language Users' Knowledge of Situational-Functional Variation” (CRC 1412), we assess the pervasiveness of register phenomena across different time periods, languages, modalities, and cultures. We define “register” as recurring variation in language use depending on the function of language and on the social situation. Informed by rich data, we aim to better understand and model the knowledge involved in situation- and function-based use of language register. In order to achieve this goal, we are using complementary methods and measures. In the review, we start by clarifying the concept of “register”, by reviewing the state of the art, and by setting out our methods and modeling goals. Against this background, we discuss three key challenges, two at the methodological level and one at the theoretical level: (1) To better uncover registers in text and spoken corpora, we propose changes to established analytical approaches. (2) To tease apart between-subject variability from the linguistic variability at issue (intra-individual situation-based register variability), we use within-subject designs and the modeling of individuals' social, language, and educational background. (3) We highlight a gap in cognitive modeling, viz. modeling the mental representations of register (processing), and present our first attempts at filling this gap. We argue that the targeted use of multiple complementary methods and measures supports investigating the pervasiveness of register phenomena and yields comprehensive insights into the cross-methodological robustness of register-related language variability. These comprehensive insights in turn provide a solid foundation for associated cognitive modeling.Peer Reviewe

PubMed Central

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Digitale Bibliothek Thüringen