Search CORE

5,799 research outputs found

An open source rule induction tool for transfer-based SMT

Author: Graham Yvette
van Genabith Josef
Publication venue: Charles University, Prague
Publication date: 01/01/2009
Field of study

In this paper we describe an open source tool for automatic induction of transfer rules. Transfer rule induction is carried out on pairs of dependency structures and their node alignment to produce all rules consistent with the node alignment. We describe an efficient algorithm for rule induction and give a detailed description of how to use the tool

Irish Universities

DCU Online Research Access Service

Learning to Parse and Translate Improves Neural Machine Translation

Author: Cho Kyunghyun
Eriguchi Akiko
Tsuruoka Yoshimasa
Publication venue
Publication date: 01/01/2017
Field of study

There has been relatively little attention to incorporating linguistic prior to neural machine translation. Much of the previous work was further constrained to considering linguistic prior on the source side. In this paper, we propose a hybrid model, called NMT+RNNG, that learns to parse and translate by combining the recurrent neural network grammar into the attention-based neural machine translation. Our approach encourages the neural machine translation model to incorporate linguistic prior during training, and lets it translate on its own afterward. Extensive experiments with four language pairs show the effectiveness of the proposed NMT+RNNG.Comment: Accepted as a short paper at the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017

arXiv.org e-Print Archive

Crossref

Translating Phrases in Neural Machine Translation

Author: Tu Zhaopeng
Wang Xing
Xiong Deyi
Zhang Min
Publication venue
Publication date: 01/01/2017
Field of study

Phrases play an important role in natural language understanding and machine translation (Sag et al., 2002; Villavicencio et al., 2005). However, it is difficult to integrate them into current neural machine translation (NMT) which reads and generates sentences word by word. In this work, we propose a method to translate phrases in NMT by integrating a phrase memory storing target phrases from a phrase-based statistical machine translation (SMT) system into the encoder-decoder architecture of NMT. At each decoding step, the phrase memory is first re-written by the SMT model, which dynamically generates relevant target phrases with contextual information provided by the NMT model. Then the proposed model reads the phrase memory to make probability estimations for all phrases in the phrase memory. If phrase generation is carried on, the NMT decoder selects an appropriate phrase from the memory to perform phrase translation and updates its decoding state by consuming the words in the selected phrase. Otherwise, the NMT decoder generates a word from the vocabulary as the general NMT decoder does. Experiment results on the Chinese to English translation show that the proposed model achieves significant improvements over the baseline on various test sets.Comment: Accepted by EMNLP 201

arXiv.org e-Print Archive

Crossref

Neural probabilistic language model for system combination

Author: Okita Tsuyoshi
Publication venue
Publication date: 01/01/2012
Field of study

This paper gives the system description of the neural probabilistic language modeling (NPLM) team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). We used the information obtained by NPLM as meta information to the system combination module. For the Spanish-English data, our paraphrasing approach achieved 25.81 BLEU points, which lost 0.19 BLEU points absolute compared to the standard confusion network-based system combination. We note that our current usage of NPLM is very limited due to the difficulty in combining NPLM and system combination

CiteSeerX

Irish Universities

DCU Online Research Access Service

ProphetMT: a tree-based SMT-driven controlled language authoring/post-editing tools

Author: Du Jinhua
Way Andy
Wu Xiaofeng
Publication venue: European Language Resources Association
Publication date: 28/05/2016
Field of study

This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a ‘Controlled Language’ that is ‘understood’ by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author’s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules

Irish Universities

DCU Online Research Access Service

Results of the WMT16 Tuning Shared Task

Author: Bojar Ondřej
Jawaid Bushra
Kamran Amir
Stanojević Miloš
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents the results of the WMT16 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the Czech-English and 8 in the English-Czech translation direction. In addition, we ran 2 baseline setups, tuning the parameters with standard optimizers for BLEU score. In contrast to previous years, the tuned systems in 2016 rely on large data

Crossref

Biblio at Institute of Formal and Applied Linguistics