387 research outputs found

    Sentence-level quality estimation for MT system combination

    Get PDF
    This paper provides the system description of the Dublin City University system combination module for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimize the Division of Labour in Hybrid MT (ML4HMT- 12). We incorporated a sentence-level quality score, obtained by sentence-level Quality Estimation (QE), as meta information guiding system combination. Instead of using BLEU or (minimum average) TER, we select a backbone for the confusion network using the estimated quality score. For the Spanish-English data, our strategy improved 0.89 BLEU points absolute compared to the best single score and 0.20 BLEU points absolute compared to the standard system combination strateg

    Enhancing scarce-resource language translation through pivot combinations

    Get PDF
    Chinese and Spanish are the most spoken languages in the world. However, there is not much research done in machine translation for this language pair. We experiment with the parallel Chinese-Spanish corpus (United Nations) to explore alternatives of SMT strategies which consist on using a pivot language. Particularly, two well-known alternatives are shown for pivoting: the cascade system and the pseudo-corpus. As Pivot language we use English, Arabic and French. Results show that English is the best pivot language between Chinese and Spanish. As a new strategy, we propose to perform a combination of the pivot strategies which is capable to highly outperform the direct translation strategy.Postprint (published version

    An augmented three-pass system combination framework: DCU combination system for WMT 2010

    Get PDF
    This paper describes the augmented threepass system combination framework of the Dublin City University (DCU) MT group for the WMT 2010 system combination task. The basic three-pass framework includes building individual confusion networks (CNs), a super network, and a modified Minimum Bayes-risk (mCon- MBR) decoder. The augmented parts for WMT2010 tasks include 1) a rescoring component which is used to re-rank the N-best lists generated from the individual CNs and the super network, 2) a new hypothesis alignment metric – TERp – that is used to carry out English-targeted hypothesis alignment, and 3) more different backbone-based CNs which are employed to increase the diversity of the mConMBR decoding phase. We took part in the combination tasks of Englishto- Czech and French-to-English. Experimental results show that our proposed combination framework achieved 2.17 absolute points (13.36 relative points) and 1.52 absolute points (5.37 relative points) in terms of BLEU score on English-to- Czech and French-to-English tasks respectively than the best single system. We also achieved better performance on human evaluation

    A three-pass system combination framework by combining multiple hypothesis alignment methods

    Get PDF
    So far, many effective hypothesis alignment metrics have been proposed and applied to the system combination, such as TER, HMM, ITER and IHMM. In addition, the Minimum Bayes-risk (MBR) decoding and the confusion network (CN) have become the state-of-the art techniques in system combination. In this paper, we present a three-pass system combination strategy that can combine hypothesis alignment results derived from different alignment metrics to generate a better translation. Firstly the different alignment metrics are carried out to align the backbone and hypotheses, and the individual CN is built corresponding to each alignment results; then we construct a super network by merging the multiple metric-based CN and generate a consensus output. Finally a modified consensus network MBR (ConMBR) approach is employed to search a best translation. Our proposed strategy out performs the best single CN as well as the best single system in our experiments on NIST Chinese-to-English test set

    Using TERp to augment the system combination for SMT

    Get PDF
    TER-Plus (TERp) is an extended TER evaluation metric incorporating morphology, synonymy and paraphrases. There are three new edit operations in TERp: Stem Matches, Synonym Matches and Phrase Substitutions (Para-phrases). In this paper, we propose a TERp-based augmented system combination in terms of the backbone selection and consensus decoding network. Combining the new properties\ud of the TERp, we also propose a two-pass decoding strategy for the lattice-based phrase-level confusion network(CN) to generate the final result. The experiments conducted on the NIST2008 Chinese-to-English test set show that our TERp-based augmented system combination framework achieves significant improvements in terms of BLEU and TERp scores compared to the state-of-the-art word-level system combination framework and a TER-based combination strategy
    • 

    corecore