1,157 research outputs found

    MaTrEx: the DCU machine translation system for ICON 2008

    Get PDF
    In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the NLP Tools Contest of the International Conference on Natural Language Processing (ICON 2008). This was our first ever attempt at working on any Indian language. In this participation, we focus on various techniques for word and phrase alignment to improve system quality. For the English-Hindi translation task we exploit source-language reordering. We also carried out experiments combining both in-domain and out-of-domain data to improve the system performance and, as a post-processing step we transliterate out-of-vocabulary items

    Dependency relations as source context in phrase-based SMT

    Get PDF
    The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline

    MATREX: the DCU MT System for WMT 2008

    Get PDF
    In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our data driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish–English and French–English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech–English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets

    Using supertags as source language context in SMT

    Get PDF
    Recent research has shown that Phrase-Based Statistical Machine Translation (PB-SMT) systems can benefit from two enhancements: (i) using words and POS tags as context-informed features on the source side; and (ii) incorporating lexical syntactic descriptions in the form of supertags on the target side. In this work we present a novel PB-SMT model that combines these two aspects by using supertags as source language contextinformed features. These features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. In our experiments two kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar. We use a memory-based classification framework that enables the estimation of these features while avoiding problems of sparseness. Despite the differences between these two approaches, the supertaggers give similar improvements. We evaluate the performance of our approach on an English-to-Chinese translation task using a state-of-the-art phrase-based SMT system, and report an improvement of 7.88% BLEU score in translation quality when adding supertags as context-informed features

    Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories

    Get PDF
    The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR’s synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics

    Experiments on domain adaptation for patent machine translation in the PLuTO project

    Get PDF
    The PLUTO1 project (Patent Language Translations Online) aims to provide a rapid solution for the online retrieval and translation of patent documents through the integration of a number of existing state-of-the-art components provided by the project partners. The paper presents some of the experiments on patent domain adaptation of the Machine Translation (MT) systems used in the PLuTO project. The experiments use the International Patent Classification for domain adaptation and are focused on the English–French language pair

    Parallel Treebanks in Phrase-Based Statistical Machine Translation

    Get PDF
    Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT

    Improving the objective function in minimum error rate training

    Get PDF
    In Minimum Error Rate Training (MERT), the parameters of an SMT system are tuned on a certain evaluation metric to improve translation quality. In this paper, we present empirical results in which parameters tuned on one metric (e.g. BLEU) may not lead to optimal scores on the same metric. The score can be improved significantly by tuning on an entirely different metric (e.g. METEOR, by 0.82 BLEU points or 3.38% relative improvement on WMT08 English–French dataset). We analyse the impact of choice of objective function in MERT and further propose three combination strategies of different metrics to reduce the bias of a single metric, and obtain parameters that receive better scores (0.99 BLEU points or 4.08% relative improvement) on evaluation metrics than those tuned on the standalone metric itself
    corecore