Search CORE

79 research outputs found

Towards predicting post-editing productivity

Author: A Duchowski
I Garcia
J White
K Rayner
L Specia
M Just
M Plitt
R Radach
S O’Brien
Sharon O’Brien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Machine translation (MT) quality is generally measured via automatic metrics, producing scores that have no meaning for translators who are required to post-edit MT output or for project managers who have to plan and budget for transla- tion projects. This paper investigates correlations between two such automatic metrics (general text matcher and translation edit rate) and post-editing productivity. For the purposes of this paper, productivity is measured via processing speed and cognitive measures of effort using eye tracking as a tool. Processing speed, average fixation time and count are found to correlate well with the scores for groups of segments. Segments with high GTM and TER scores require substantially less time and cognitive effort than medium or low-scoring segments. Future research involving score thresholds and confidence estimation is suggested

Crossref

Irish Universities

DCU Online Research Access Service

A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

Author: Bisazza Arianna
Federico Marcello
Publication venue: 'MIT Press - Journals'
Publication date: 14/03/2016
Field of study

Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

UvA-DARE

International Migration, Integration and Social Cohesion online publications

MATREX: the DCU MT System for WMT 2008

Author: Ma Yanjun
Ozdowska Sylwia
Tinsley John
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2008
Field of study

In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our data driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish–English and French–English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech–English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets

CiteSeerX

Irish Universities

DCU Online Research Access Service

Further meta-evaluation of machine translation

Author: Callison-Burch Chris
Fordyce Cameron
Koehn Philipp
Monz Christof
Schroeder Josh
Publication venue
Publication date: 01/01/2008
Field of study

Edinburgh Research Explorer

The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation

Author: Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
Hernández Huerta Adolfo
Mariño Acebal José Bernardo
Monte Moreno Enrique
Rodríguez Fonollosa José Adrián
Publication venue
Publication date: 01/01/2012
Field of study

This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

English-to-Czech MT: Large Data and Beyond

Author: Bojar Ondřej
Publication venue
Publication date: 06/12/2018
Field of study

CU Digital Repository

Deep evaluation of hybrid architectures: simple metrics correlated with human judgments

Author: Díaz de Ilarraza Sánchez Arantza
España Bonet Cristina
Labaka Gorka
Màrquez Villodre Lluís
Sarasola Gabiola Kepa
Publication venue
Publication date: 01/01/2011
Field of study

The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Analyzing Error Types in English-Czech Machine Translation

Author: Bojar Ondřej
Publication venue
Publication date: 01/01/2011
Field of study

This paper examines two techniques of manual evaluation that can be used to identify error types of individual machine translation systems. The first technique of “blind post-editing” is being used in WMT evaluation campaigns since 2009 and manually constructed data of this type are available for various language pairs. The second technique of explicit marking of errors has been used in the past as well. We propose a method for interpreting blind post-editing data at a finer level and compare the results with explicit marking of errors. While the human annotation of either of the techniques is not exactly reproducible (relatively low agreement), both techniques lead to similar observations of differences of the systems. Specifically, we are able to suggest which errors in MT output are easy and hard to correct with no access to the source, a situation experienced by users who do not understand the source language

CiteSeerX

Biblio at Institute of Formal and Applied Linguistics

Improving English to Spanish out-of-domain translations by morphology generalization and generation

Author: Formiga Fanals Lluís
Hernández Huerta Adolfo
Mariño Acebal José Bernardo
Monte Moreno Enrique
Publication venue
Publication date: 01/01/2012
Field of study

This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-ofdomain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and targetlanguage sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-ofdomain data.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC