Search CORE

39,219 research outputs found

Findings of the 2015 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Chatterjee Rajen
Federmann Christian
Haddow Barry
Hokamp Chris
Huck Matthias
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Scarton Carolina
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Discriminative ridge regression algorithm for adaptation in statistical machine translation

Author: Casacuberta Nolla Francisco
Chinea-Ríos Mara
Sanchis-Trilles Germán
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2019
Field of study

[EN] We present a simple and reliable method for estimating the log-linear weights of a state-of-the-art machine translation system, which takes advantage of the method known as discriminative ridge regression (DRR). Since inappropriate weight estimations lead to a wide variability of translation quality results, reaching a reliable estimate for such weights is critical for machine translation research. For this reason, a variety of methods have been proposed to reach reasonable estimates. In this paper, we present an algorithmic description and empirical results proving that DRR is able to provide comparable translation quality when compared to state-of-the-art estimation methods [i.e. MERT and MIRA], with a reduction in computational cost. Moreover, the empirical results reported are coherent across different corpora and language pairs.The research leading to these results were partially supported by projects CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER) and PROMETEO/2018/004. We also acknowledge NVIDIA for the donation of a GPU used in this work.Chinea-Ríos, M.; Sanchis-Trilles, G.; Casacuberta Nolla, F. (2019). Discriminative ridge regression algorithm for adaptation in statistical machine translation. Pattern Analysis and Applications. 22(4):1293-1305. https://doi.org/10.1007/s10044-018-0720-5S12931305224Barrachina S, Bender O, Casacuberta F, Civera J, Cubel E, Khadivi S, Lagarda A, Ney H, Tomás J, Vidal E et al (2009) Statistical approaches to computer-assisted translation. Comput Ling 35(1):3–28Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Specia L (eds) (2014) Proceedings of the ninth workshop on statistical machine translation. Association for Computational LinguisticsBrown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Ling 19:263–311Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan OF (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 17–53Chen B, Cherry C (2014) A systematic comparison of smoothing techniques for sentence-level bleu. In: Proceedings of the workshop on statistical machine translation, pp 362–367Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the North American chapter of the association for computational linguistics, pp 427–436Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: Proceedings of the annual meeting of the association for computational linguistics, pp 176–181Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585Hasler E, Haddow B, Koehn P (2011) Margin infused relaxed algorithm for moses. Prague Bull Math Ling 96:69–78Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, pp 1352–1362Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: Proceedings of the international conference on acoustics, speech and signal processing, pp 181–184Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the machine translation summit, pp 79–86Koehn P (2010) Statistical machine translation. Cambridge University Press, CambridgeKoehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 177–180Lavie MDA (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the annual meeting of the association for computational linguistics, pp 376–387Marie B, Max A (2015) Multi-pass decoding with complex feature guidance for statistical machine translation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 554–559Martínez-Gómez P, Sanchis-Trilles G, Casacuberta F (2012) Online adaptation strategies for statistical machine translation in post-editing scenarios. Pattern Recogn 45(9):3193–3203Nakov P, Vogel S (2017) Robust tuning datasets for statistical machine translation. arXiv:1710.00346Neubig G, Watanabe T (2016) Optimization for statistical machine translation: a survey. Comput Ling 42(1):1–54Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 160–167Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Ling 29:19–51Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the international conference on acoustics, speech and signal processing, pp 311–318Sanchis-Trilles G, Casacuberta F (2010) Log-linear weight optimisation via Bayesian adaptation in statistical machine translation. In: Proceedings of the annual meeting of the association for computational linguistics, pp 1077–1085Sanchis-Trilles G, Casacuberta F (2015) Improving translation quality stability using Bayesian predictive adaptation. Comput Speech Lang 34(1):1–17Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the annual meeting of the association for machine translation in the Americas, pp 223–231Sokolov A, Yvon F (2011) Minimum error rate training semiring. In: Proceedings of the annual conference of the European association for machine translation, pp 241–248Stauffer C, Grimson WEL (2000) Learning patterns of activity using real-time tracking. Pattern Anal Mach Intell 22(8):747–757Stolcke A (2002) Srilm—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing, pp 901–904Tiedemann J (2009) News from opus—a collection of multilingual parallel corpora with tools and interfaces. In: Proceedings of the recent advances in natural language processing, pp 237–248Tiedemann J (2012) Parallel data, tools and interfaces in opus. In: Proceedings of the language resources and evaluation conference, pp 2214–221

RiuNet

Discourse Structure in Machine Translation Evaluation

Author: Guzmán Francisco
Joty Shafiq
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 01/01/2017
Field of study

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

Machine translation evaluation resources and methods: a survey

Author: Han Lifeng
Publication venue
Publication date: 08/11/2018
Field of study

We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

DCU Online Research Access Service

The impact of machine translation error types on post-editing effort indicators

Author: Daems Joke
Hartsuiker Robert
Macken Lieve
Vandepitte Sonia
Publication venue: Association for Machine Translation in the Americas
Publication date: 01/01/2015
Field of study

In this paper, we report on a post-editing study for general text types from English into Dutch conducted with master's students of translation. We used a fine-grained machine translation (MT) quality assessment method with error weights that correspond to severity levels and are related to cognitive load. Linear mixed effects models are applied to analyze the impact of MT quality on potential post-editing effort indicators. The impact of MT quality is evaluated on three different levels, each with an increasing granularity. We find that MT quality is a significant predictor of all different types of post-editing effort indicators and that different types of MT errors predict different post-editing effort indicators

Ghent University Academic Bibliography

Referenceless Quality Estimation for Natural Language Generation

Author: Dušek Ondřej
Novikova Jekaterina
Rieser Verena
Publication venue
Publication date: 05/08/2017
Field of study

Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output. In this paper, we propose a referenceless quality estimation (QE) approach based on recurrent neural networks, which predicts a quality score for a NLG system output by comparing it to the source meaning representation only. Our method outperforms traditional metrics and a constant baseline in most respects; we also show that synthetic data helps to increase correlation results by 21% compared to the base system. Our results are comparable to results obtained in similar QE tasks despite the more challenging setting.Comment: Accepted as a regular paper to 1st Workshop on Learning to Generate Natural Language (LGNL), Sydney, 10 August 201

arXiv.org e-Print Archive

Heriot Watt Pure

Eye-tracking as a measure of cognitive effort for post-editing of machine translation

Author: ALPAC
Bruckner
Caffrey
Carl
Carl
Chang
Danks
Denkowski
Gaspari
Gonçalves
Hokamp
Hutchins
Hvelplund
Jakobsen
Koglin
Krings
Lacruz
Lommel
Läubli
Moorkens
Mossop
Nunes Vieira
O’Brien
O’Brien
O’Brien
Schmidtke
Senez
Shreve
Snover
Specia
Specia
Su
Teixeira
Turian
Vasconcellos
Wagner
Way
Wilms
Publication venue: 'John Benjamins Publishing Company'
Publication date: 16/10/2018
Field of study

The three measurements for post-editing effort as proposed by Krings (2001) have been adopted by many researchers in subsequent studies and publications. These measurements comprise temporal effort (the speed or productivity rate of post-editing, often measured in words per second or per minute at the segment level), technical effort (the number of actual edits performed by the post-editor, sometimes approximated using the Translation Edit Rate metric (Snover et al. 2006), again usually at the segment level), and cognitive effort. Cognitive effort has been measured using Think-Aloud Protocols, pause measurement, and, increasingly, eye-tracking. This chapter provides a review of studies of post-editing effort using eye-tracking, noting the influence of publications by Danks et al. (1997), and O’Brien (2006, 2008), before describing a single study in detail. The detailed study examines whether predicted effort indicators affect post-editing effort and results were previously published as Moorkens et al. (2015). Most of the eye-tracking data analysed were unused in the previou

Crossref

Irish Universities

DCU Online Research Access Service