41,183 research outputs found
Dependency relations as source context in phrase-based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical
choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and
supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline
Using supertags as source language context in SMT
Recent research has shown that Phrase-Based Statistical Machine Translation (PB-SMT) systems can benefit from two
enhancements: (i) using words and POS tags as context-informed features on the source side; and (ii) incorporating lexical syntactic descriptions in the form of supertags on the target side. In this work we
present a novel PB-SMT model that combines these two aspects by using supertags as source language contextinformed features. These features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. In our experiments two
kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar.
We use a memory-based classification framework that enables the estimation of these features while avoiding
problems of sparseness. Despite the differences between these two approaches, the supertaggers give similar improvements. We evaluate the performance of our approach on an English-to-Chinese translation task using a state-of-the-art phrase-based SMT system, and report an
improvement of 7.88% BLEU score in translation quality when adding supertags as context-informed features
Exploiting source similarity for SMT using context-informed features
In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We
present a memory-based classification framework that enables the estimation of these features while avoiding
sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT
system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features
Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories
The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central
issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given
that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into
MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR’s synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics
- …