25,152 research outputs found

    Experiments on domain adaptation for English-Hindi SMT

    Get PDF
    Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for English—Hindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline

    Domain adaptation strategies in statistical machine translation: a brief overview

    Get PDF
    © Cambridge University Press, 2015.Statistical machine translation (SMT) is gaining interest given that it can easily be adapted to any pair of languages. One of the main challenges in SMT is domain adaptation because the performance in translation drops when testing conditions deviate from training conditions. Many research works are arising to face this challenge. Research is focused on trying to exploit all kinds of material, if available. This paper provides an overview of research, which copes with the domain adaptation challenge in SMT.Peer ReviewedPostprint (author's final draft

    Discourse Structure in Machine Translation Evaluation

    Full text link
    In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

    MATREX: the DCU MT System for WMT 2008

    Get PDF
    In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our data driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish–English and French–English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech–English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets

    Real Time Animation of Virtual Humans: A Trade-off Between Naturalness and Control

    Get PDF
    Virtual humans are employed in many interactive applications using 3D virtual environments, including (serious) games. The motion of such virtual humans should look realistic (or ‘natural’) and allow interaction with the surroundings and other (virtual) humans. Current animation techniques differ in the trade-off they offer between motion naturalness and the control that can be exerted over the motion. We show mechanisms to parametrize, combine (on different body parts) and concatenate motions generated by different animation techniques. We discuss several aspects of motion naturalness and show how it can be evaluated. We conclude by showing the promise of combinations of different animation paradigms to enhance both naturalness and control
    corecore