68,384 research outputs found

    TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies

    Get PDF
    This paper presents an extended work on the trilingual spoken language translation corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC), namely TriECCC. TriECCC is a simultaneously spoken language translation corpus with parallel resources of speech and text in three languages: Khmer, English, and French. This corpus has approximately [Formula: see text] thousand utterances, approximately [Formula: see text], [Formula: see text], and [Formula: see text] h in length of speech, and [Formula: see text], [Formula: see text] and [Formula: see text] million words in text, in Khmer, English, and French, respectively. We first report the baseline results of machine translation (MT), and speech translation (ST) systems, which show reasonable performance. We then investigate the use of the ROVER method to combine multiple MT outputs and fine-tune the pre-trained English–French MT models to enhance the Khmer MT systems. Experimental results show that the ROVER is effective for combining English-to-Khmer and French-to-Khmer systems. Fine-tuning from both single and multiple parents shows the effective improvement on the BLEU scores for Khmer-to-English/French and English/French-to-Khmer MT systems

    An incremental three-pass system combination framework by combining multiple hypothesis alignment methods

    Get PDF
    System combination has been applied successfully to various machine translation tasks in recent years. As is known, the hypothesis alignment method is a critical factor for the translation quality of system combination. To date, many effective hypothesis alignment metrics have been proposed and applied to the system combination, such as TER, HMM, ITER, IHMM, and SSCI. In addition, Minimum Bayes-risk (MBR) decoding and confusion networks (CN) have become state-of-the-art techniques in system combination. In this paper, we examine different hypothesis alignment approaches and investigate how much the hypothesis alignment results impact on system combination, and finally present a three-pass system combination strategy that can combine hypothesis alignment results derived from multiple alignment metrics to generate a better translation. Firstly, these different alignment metrics are carried out to align the backbone and hypotheses, and the individual CNs are built corresponding to each set of alignment results; then we construct a ‘super network’ by merging the multiple metric-based CNs to generate a consensus output. Finally a modified MBR network approach is employed to find the best overall translation. Our proposed strategy outperforms the best single confusion network as well as the best single system in our experiments on the NIST Chinese-to-English test set and the WMT2009 English-to-French system combination shared test set

    Source-side context-informed hypothesis alignment for combining outputs from machine translation systems

    Get PDF
    This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. Traditional hypothesis alignment algorithms such as TER, HMM and IHMM do not directly utilise the context information of the source side but rather address the alignment issues via the output data itself. In this paper, a source-side context-informed (SSCI) hypothesis alignment method is proposed to carry out the word alignment and word reordering issues. First of all, the source–target word alignment links are produced as the hidden variables by exporting source phrase spans during the translation decoding process. Secondly, a mapping strategy and normalisation model are employed to acquire the 1- to-1 alignment links and build the confusion network (CN). The source-side context-based method outperforms the state-of-the-art TERbased alignment model in our experiments on the WMT09 English-to-French and NIST Chinese-to-English data sets respectively. Experimental results demonstrate that our proposed approach scores consistently among the best results across different data and language pair conditions

    A three-pass system combination framework by combining multiple hypothesis alignment methods

    Get PDF
    So far, many effective hypothesis alignment metrics have been proposed and applied to the system combination, such as TER, HMM, ITER and IHMM. In addition, the Minimum Bayes-risk (MBR) decoding and the confusion network (CN) have become the state-of-the art techniques in system combination. In this paper, we present a three-pass system combination strategy that can combine hypothesis alignment results derived from different alignment metrics to generate a better translation. Firstly the different alignment metrics are carried out to align the backbone and hypotheses, and the individual CN is built corresponding to each alignment results; then we construct a super network by merging the multiple metric-based CN and generate a consensus output. Finally a modified consensus network MBR (ConMBR) approach is employed to search a best translation. Our proposed strategy out performs the best single CN as well as the best single system in our experiments on NIST Chinese-to-English test set

    Sentence-level quality estimation for MT system combination

    Get PDF
    This paper provides the system description of the Dublin City University system combination module for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimize the Division of Labour in Hybrid MT (ML4HMT- 12). We incorporated a sentence-level quality score, obtained by sentence-level Quality Estimation (QE), as meta information guiding system combination. Instead of using BLEU or (minimum average) TER, we select a backbone for the confusion network using the estimated quality score. For the Spanish-English data, our strategy improved 0.89 BLEU points absolute compared to the best single score and 0.20 BLEU points absolute compared to the standard system combination strateg

    Neural System Combination for Machine Translation

    Full text link
    Neural machine translation (NMT) becomes a new approach to machine translation and generates much more fluent results compared to statistical machine translation (SMT). However, SMT is usually better than NMT in translation adequacy. It is therefore a promising direction to combine the advantages of both NMT and SMT. In this paper, we propose a neural system combination framework leveraging multi-source NMT, which takes as input the outputs of NMT and SMT systems and produces the final translation. Extensive experiments on the Chinese-to-English translation task show that our model archives significant improvement by 5.3 BLEU points over the best single system output and 3.4 BLEU points over the state-of-the-art traditional system combination methods.Comment: Accepted as a short paper by ACL-201

    Combining semantic and syntactic generalization in example-based machine translation

    Get PDF
    In this paper, we report our experiments in combining two EBMT systems that rely on generalized templates, Marclator and CMU-EBMT, on an English–German translation task. Our goal was to see whether a statistically significant improvement could be achieved over the individual performances of these two systems. We observed that this was not the case. However, our system consistently outperformed a lexical EBMT baseline system
    corecore