Search CORE

11,776 research outputs found

Hierarchical Phrase-Based Translation with Suffix Arrays

Author: Lopez Adam
Publication venue
Publication date: 01/01/2007
Field of study

A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this prob-lem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and ex-tract rules on the fly. Hierarchical phrase-based translation introduces the added wrin-kle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pat-tern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms for hierar-chical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps.

CiteSeerX

Edinburgh Research Explorer

Neural fuzzy repair : integrating fuzzy matches into neural machine translation

Author: Bulté Bram
Tezcan Arda
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM). We propose and test two methods for augmenting NMT training data with fuzzy TM matches. Tests on the DGT-TM data set for two language pairs show consistent and substantial improvements over a range of baseline systems. The results suggest that this method is promising for any translation environment in which a sizeable TM is available and a certain amount of repetition across translations is to be expected, especially considering its ease of implementation

Crossref

Ghent University Academic Bibliography

Improving the post-editing experience using translation recommendation: a user study

Author: He Yifan
Ma Yanjun
Roturier Johann
van Genabith Josef
Way Andy
Publication venue: Association for Machine Translation in the Americas
Publication date: 01/01/2010
Field of study

We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We analyze the effectiveness of the model as well as the reaction of potential users. Based on the performance statistics and the users’comments, we find that translation recommendation can reduce the workload of professional post-editors and improve the acceptance of MT in the localization industry

Irish Universities

DCU Online Research Access Service

Combining semantic and syntactic generalization in example-based machine translation

Author: Ebling Sarah
Kumar Naskar Sudip
Volk Martin
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 30/05/2011
Field of study

In this paper, we report our experiments in combining two EBMT systems that rely on generalized templates, Marclator and CMU-EBMT, on an English–German translation task. Our goal was to see whether a statistically signiﬁcant improvement could be achieved over the individual performances of these two systems. We observed that this was not the case. However, our system consistently outperformed a lexical EBMT baseline system

CiteSeerX

Irish Universities

DCU Online Research Access Service

Dependency relations as source context in phrase-based SMT

Author: Haque Rejwanul
Naskar Sudip Kumar
van den Bosch Antal
Way Andy
Publication venue
Publication date: 01/01/2009
Field of study

The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline

Waseda University Repository

DCU Online Research Access Service

Exploiting source similarity for SMT using context-informed features

Author: Stroppa Nicolas
van den Bosch Antal
Way Andy
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We present a memory-based classification framework that enables the estimation of these features while avoiding sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features

Irish Universities

DCU Online Research Access Service

Approximating solution structure of the Weighted Sentence Alignment problem

Author: Kolokolova Antonina
Nizamee Renesa
Publication venue
Publication date: 08/09/2014
Field of study

We study the complexity of approximating solution structure of the bijective weighted sentence alignment problem of DeNero and Klein (2008). In particular, we consider the complexity of finding an alignment that has a significant overlap with an optimal alignment. We discuss ways of representing the solution for the general weighted sentence alignment as well as phrases-to-words alignment problem, and show that computing a string which agrees with the optimal sentence partition on more than half (plus an arbitrarily small polynomial fraction) positions for the phrases-to-words alignment is NP-hard. For the general weighted sentence alignment we obtain such bound from the agreement on a little over 2/3 of the bits. Additionally, we generalize the Hamming distance approximation of a solution structure to approximating it with respect to the edit distance metric, obtaining similar lower bounds

arXiv.org e-Print Archive

CiteSeerX

Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

Author: Forcada Mikel
Sánchez-Martínez Felipe
Way Andy
Publication venue
Publication date: 01/01/2009
Field of study

This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Irish Universities

DCU Online Research Access Service