66 research outputs found
MaTrEx: the DCU machine translation system for ICON 2008
In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the NLP Tools Contest of the International
Conference on Natural Language Processing (ICON 2008). This was our first ever attempt at working on any Indian language. In this participation, we focus on various techniques for word and phrase alignment to improve system quality. For the English-Hindi translation task we exploit
source-language reordering. We also carried out experiments combining both in-domain and out-of-domain data to improve
the system performance and, as a post-processing step we transliterate out-of-vocabulary items
Using supertags as source language context in SMT
Recent research has shown that Phrase-Based Statistical Machine Translation (PB-SMT) systems can benefit from two
enhancements: (i) using words and POS tags as context-informed features on the source side; and (ii) incorporating lexical syntactic descriptions in the form of supertags on the target side. In this work we
present a novel PB-SMT model that combines these two aspects by using supertags as source language contextinformed features. These features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. In our experiments two
kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar.
We use a memory-based classification framework that enables the estimation of these features while avoiding
problems of sparseness. Despite the differences between these two approaches, the supertaggers give similar improvements. We evaluate the performance of our approach on an English-to-Chinese translation task using a state-of-the-art phrase-based SMT system, and report an
improvement of 7.88% BLEU score in translation quality when adding supertags as context-informed features
Combining semantic and syntactic generalization in example-based machine translation
In this paper, we report our experiments in combining two EBMT systems that rely on generalized templates, Marclator and CMU-EBMT, on an EnglishâGerman translation task. Our goal was to see whether a statistically signiïŹcant improvement could be achieved over the individual performances of these two systems. We observed that this was not the case. However, our system consistently outperformed a lexical EBMT baseline system
Comparative evaluation of research vs. Online MT systems
This paper reports MT evaluation experiments that were conducted at the end of year 1 of the EU-funded CoSyne
1 project for three language combinations, considering translations from German, Italian and Dutch into English. We present a comparative evaluation of the MT software developed within the project against four of the leading free webbased MT systems across a range of state-of-the-art automatic evaluation metrics. The data sets from the news domain that were created and used for training purposes and also for this evaluation exercise, which are available to the research community, are also described. The evaluation results for the news domain are very encouraging: the CoSyne MT software consistently beats the rule-based MT systems, and for translations from Italian and Dutch into English in particular the scores given by some of the standard automatic evaluation metrics are not too distant from those obtained by wellestablished statistical online MT systems
Mitigating problems in analogy-based EBMT with SMT and vice versa: a case study with named entity transliteration
Five years ago, a number of papers reported an experimental implementation of an Example Based Machine Translation (EBMT) system using proportional analogy. This approach, a type of analogical learning, was attractive because of its simplicity; and the paper reported considerable success with the method using various language pairs. In this paper, we describe our attempt to use this approach for tackling EnglishâHindi Named Entity (NE) Transliteration. We have implemented our own EBMT system using proportional analogy and have found that the analogy-based system on its own has low precision but a high recall due to the fact that a large number of names are untransliterated with the approach. However, mitigating problems in analogy-based EBMT with SMT and vice-versa have shown considerable improvement over the individual approach
Statistically motivated example-based machine translation using translation memory
In this paper we present a novel way of integrating Translation Memory into an Example-based Machine translation System (EBMT) to deal with the issue of low
resources. We have used a dialogue of 380 sentences as the example-base for our system. The translation units in the
Translation Memories are automatically extracted based on the aligned phrases (words) of a statistical machine translation (SMT) system. We attempt to use the approach to improve translation from English to Bangla as many statistical machine translation systems have difficulty
with such small amounts of training data. We have found the approach shows improvement over a baseline SMT system
English-Hindi transliteration using context-informed PB-SMT: the DCU system for NEWS 2009
This paper presents EnglishâHindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification
framework that enables efficient estimation of these features while avoiding data sparseness problems.We carried out experiments both at character and transliteration unit (TU) level. Position-dependent source context features produce significant improvements in terms of all evaluation metrics
Experiments on domain adaptation for English-Hindi SMT
Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for EnglishâHindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora
with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline
Dependency relations as source context in phrase-based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical
choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and
supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a DutchâEnglish translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline
A review of EBMT using proportional analogies
Some years ago a number of papers reported an experimental implementation of Example Based Machine Translation (EBMT) using Proportional Analogy. This approach, a type of analogical learning, was attractive because of its simplicity; and the papers reported considerable success with the method. This paper reviews what we believe to be the totality of research reported using this method, as an introduction to our own experiments in this framework, reported in a companion paper. We report first some lack of clarity in the previously published work, and then report our findings that the purity of the proportional analogy approach imposes huge run-time complexity for
the EBMT task even when heuristics as hinted at in the original literature are applied to reduce the
amount of computation
- âŠ