Search CORE

1,191 research outputs found

Accuracy-based scoring for DOT: towards direct error minimization for data-oriented translation

Author: Galron Daniel
Melamed I. Dan
Penkale Sergio
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

In this work we present a novel technique to rescore fragments in the Data-Oriented Translation model based on their contribution to translation accuracy. We describe three new rescoring methods, and present the initial results of a pilot experiment on a small subset of the Europarl corpus. This work is a proof-of-concept, and is the first step in directly optimizing translation decisions solely on the hypothesized accuracy of potential translations resulting from those decisions

CiteSeerX

Irish Universities

DCU Online Research Access Service

Improving Statistical Machine Translation Performance by Oracle-BLEU Model Re-estimation

Author: Dakwale P.
Monz C.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International Migration, Integration and Social Cohesion online publications

A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

Author: Bisazza Arianna
Federico Marcello
Publication venue: 'MIT Press - Journals'
Publication date: 14/03/2016
Field of study

Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Linguistic Structure in Statistical Machine Translation

Author: Herrmann Teresa
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation

KITopen

Analyzing the Potential of Source Sentence Reordering in Statistical Machine Translation

Author: Herrmann Teresa
Niehues Jan
Waibel Alex
Weiner Jochen
Publication venue
Publication date: 20/04/2022
Field of study

KITopen

Strategies for effective utilization of training data for machine translation

Author: Dakwale P.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Strategies for effective utilization of training data for machine translation

Author: Dakwale P.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Reassessing the proper place of man and machine in translation: a pre-translation scenario

Author: Ive Julia
Max Aurélien
Yvon François
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Traditionally, human--machine interaction to reach an improved machine translation (MT) output takes place ex-post and consists of correcting this output. In this work, we investigate other modes of intervention in the MT process. We propose a Pre-Edition protocol that involves: (a) the detection of MT translation difficulties; (b) the resolution of those difficulties by a human translator, who provides their translations (pre-translation); and (c) the integration of the obtained information prior to the automatic translation. This approach can meet individual interaction preferences of certain translators and can be particularly useful for production environments, where more control over output quality is needed. Early resolution of translation difficulties can prevent downstream errors, thus improving the final translation quality ``for free''. We show that translation difficulty can be reliably predicted for English for various source units. We demonstrate that the pre-translation information can be successfully exploited by an MT system and that the indirect effects are genuine, accounting for around 16% of the total improvement. We also provide a study of the human effort involved in the resolution process

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Author: Aziz W.
Eikema B.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest that there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode, under the model distribution. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we criticise NMT models probabilistically showing that stochastic samples following the model's own generative story do reproduce various statistics of the training data well, but that it is beam search that strays from such statistics. We show that some of the known pathologies of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account statistics gathered from the model distribution holistically. As a proof of concept we show that a straightforward implementation of minimum Bayes risk decoding gives good results outperforming beam search using as little as 30 samples, confirming that MLE-trained NMT models do capture important aspects of translation well in expectation

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE