Search CORE

11 research outputs found

Evaluating MT for massive open online courses: a multifaceted comparison between PBSMT and NMT systems

Author: Castilho Sheila
Gaspari Federico
Georgakopoulou Panayota
Moorkens Joss
Sennrich Rico
Way Andy
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 30/07/2018
Field of study

This article reports a multifaceted comparison between statistical and neural machine translation (MT) systems that were developed for translation of data from Massive Open Online Courses (MOOCs). The study uses four language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred in side-by-side ranking, and is found to contain fewer overall errors. Results are less clear-cut for some error categories, and for temporal and technical post-editing effort. In addition, results are reported based on sentence length, showing advantages and disadvantages depending on the particular language pair and MT paradigm

Irish Universities

DCU Online Research Access Service

Evaluating MT for massive open online courses

Author: Castilho Sheila
Gaspari Federico
Georgakopoulou Panayota
Moorkens Joss
Sennrich Rico
Way Andy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This article reports a multifaceted comparison between statistical and neural machine translation (MT) systems that were developed for translation of data from massive open online courses (MOOCs). The study uses four language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neuralMTis preferred in side-by-side ranking, and is found to contain fewer overall errors. Results are less clear-cut for some error categories, and for temporal and technical post-editing effort. In addition, results are reported based on sentence length, showing advantages and disadvantages depending on the particular language pair and MT paradigm

Archivio della ricerca - Università degli studi di Napoli Federico II

Edinburgh Research Explorer

Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain

Author: Esplà-Gomis Miquel (ed.)
Forcada Mikel L. (ed.)
Martins André (ed.)
Popović Maja (ed.)
Pérez-Ortiz Juan Antonio (ed.)
Rico Celia (ed.)
Sánchez-Martínez Felipe (ed.)
Van den Bogaert Joachim (ed.)
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Professional translators’ and project managers’ perceptions of machine translation and post-editing: a survey study

Author: Dorst A.G.
Jongste D.
Valdez S.
Publication venue
Publication date: 18/03/2023
Field of study

Descriptive and Comparative Linguistic

Leiden University Scholary Publications

Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

Author: Edman Lukas
Noord van, Gertjan
Toral Ruiz Antonio
Publication venue
Publication date: 01/01/2020
Field of study

Dissertations of the University of Groningen

Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

Author: Edman Lukas
Noord van, Gertjan
Toral Ruiz Antonio
Publication venue
Publication date: 01/01/2020
Field of study

ARTS repository - University of Groningen

Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

Author: Edman Lukas
Noord van, Gertjan
Toral Ruiz Antonio
Publication venue
Publication date: 01/01/2020
Field of study

Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Automatic Speech Recognition (ASR) and NMT for Interlingual and Intralingual Communication: Speech to Text Technology for Live Subtitling and Accessibility.

Author: Gregori Alessandro <1975>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 26/10/2021
Field of study

Considered the increasing demand for institutional translation and the multilingualism of international organizations, the application of Artificial Intelligence (AI) technologies in multilingual communications and for the purposes of accessibility has become an important element in the production of translation and interpreting services (Zetzsche, 2019). In particular, the widespread use of Automatic Speech Recognition (ASR) and Neural Machine Translation (NMT) technology represents a recent development in the attempt of satisfying the increasing demand for interinstitutional, multilingual communications at inter-governmental level (Maslias, 2017). Recently, researchers have been calling for a universalistic view of media and conference accessibility (Greco, 2016). The application of ASR, combined with NMT, may allow for the breaking down of communication barriers at European institutional conferences where multilingualism represents a fundamental pillar (Jopek Bosiacka, 2013). In addition to representing a so-called disruptive technology (Accipio Consulting, 2006), ASR technology may facilitate the communication with non-hearing users (Lewis, 2015). Thanks to ASR, it is possible to guarantee content accessibility for non-hearing audience via subtitles at institutionally-held conferences or speeches. Hence the need for analysing and evaluating ASR output: a quantitative approach is adopted to try to make an evaluation of subtitles, with the objective of assessing its accuracy (Romero-Fresco, 2011). A database of F.A.O.’s and other international institutions’ English-language speeches and conferences on climate change is taken into consideration. The statistical approach is based on WER and NER models (Romero-Fresco, 2016) and on an adapted version. The ASR software solution implemented into the study will be VoxSigma by Vocapia Research and Google Speech Recognition engine. After having defined a taxonomic scheme, Native and Non-Native subtitles are compared to gold standard transcriptions. The intralingual and interlingual output generated by NMT is specifically analysed and evaluated via the NTR model to evaluate accuracy and accessibility

AMS Tesi di Dottorato

Evaluating MT for massive open online courses: a multifaceted comparison between PBSMT and NMT systems

Author: Castilho Sheila
Gaspari Federico
Georgakopoulou Panayota
Moorkens Joss
Sennrich Rico
Way Andy
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 30/07/2018
Field of study

Irish Universities

Simplifying, reading, and machine translating health content: an empirical investigation of usability

Author: Rossetti Alessandra
Publication venue: Dublin City University. Centre for Translation and Textual Studies (CTTS)
Publication date: 01/11/2019
Field of study

Text simplification, through plain language (PL) or controlled language (CL), is adopted to increase readability, comprehension and machine translatability of (health) content. Cochrane is a non-profit organisation where volunteer authors summarise and simplify health-related English texts on the impact of treatments and interventions into plain language summaries (PLS), which are then disseminated online to the lay audience and translated. Cochrane’s simplification approach is non-automated, and involves the manual checking and implementation of different sets of PL guidelines, which can be an unsatisfactory, challenging and time-consuming task. This thesis examined if using the Acrolinx CL checker to automatically and consistently check PLS for readability and translatability issues would increase the usability of Cochrane’s simplification approach and, more precisely: (i) authors’ satisfaction; and (ii) authors’ effectiveness in terms of readability, comprehensibility, and machine translatability into Spanish. Data on satisfaction were collected from twelve Cochrane authors by means of the System Usability Scale and follow-up preference questions. Readability was analysed through the computational tool Coh-Metrix. Evidence on comprehensibility was gathered through ratings and recall protocols produced by lay readers, both native and non-native speakers of English. Machine translatability was assessed in terms of adequacy and fluency with forty-one Cochrane contributors, all native speakers of Spanish. Authors seemed to welcome the introduction of Acrolinx, and the adoption of this CL checker reduced word length, sentence length, and syntactic complexity. No significant impact on comprehensibility and machine translatability was identified. We observed that reading skills and characteristics other than simplified language (e.g. formatting) might influence comprehension. Machine translation quality was relatively high, with mainly style issues. This thesis presented an environment that could boost volunteer authors’ satisfaction and foster their adoption of simple language. We also discussed strategies to increase the accessibility of online health content among lay readers with different skills and language backgrounds

Irish Universities

DCU Online Research Access Service