150 research outputs found
Bidirectional Decoding for Statistical Machine Translation
This paper describes the right-to-left decoding method, which translates an input string by generating in right-to-left direction. In addition, presented is the bidirectional decoding method, that can take both of the advantages of left-to-right and right-to-left decoding method by generating output in both ways and by merging hypothesized partial outputs of two directions. The experimental results on Japanese and English translation showed that the right-to-left was better for Englith-to-Japanese translation, while the left-to-right was suitable for Japanese-to-English translation. It was also observed that the bidirectional method was better for English-to-Japanese translation
Syntax-Directed Attention for Neural Machine Translation
Attention mechanism, including global attention and local attention, plays a
key role in neural machine translation (NMT). Global attention attends to all
source words for word prediction. In comparison, local attention selectively
looks at fixed-window source words. However, alignment weights for the current
target word often decrease to the left and right by linear distance centering
on the aligned source position and neglect syntax-directed distance
constraints. In this paper, we extend local attention with syntax-distance
constraint, to focus on syntactically related source words with the predicted
target word, thus learning a more effective context vector for word prediction.
Moreover, we further propose a double context NMT architecture, which consists
of a global context vector and a syntax-directed context vector over the global
attention, to provide more translation performance for NMT from source
representation. The experiments on the large-scale Chinese-to-English and
English-to-Germen translation tasks show that the proposed approach achieves a
substantial and significant improvement over the baseline system.Comment: AAAI2018, revised versio
Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation
Traditional Neural machine translation (NMT) involves a fixed training
procedure where each sentence is sampled once during each epoch. In reality,
some sentences are well-learned during the initial few epochs; however, using
this approach, the well-learned sentences would continue to be trained along
with those sentences that were not well learned for 10-30 epochs, which results
in a wastage of time. Here, we propose an efficient method to dynamically
sample the sentences in order to accelerate the NMT training. In this approach,
a weight is assigned to each sentence based on the measured difference between
the training costs of two iterations. Further, in each epoch, a certain
percentage of sentences are dynamically sampled according to their weights.
Empirical results based on the NIST Chinese-to-English and the WMT
English-to-German tasks depict that the proposed method can significantly
accelerate the NMT training and improve the NMT performance.Comment: Revised version of ACL-201
SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT). Existing work has shown that neural sub-word segmenters are better than Byte-Pair Encoding (BPE), however, they are inefficient, as they require parallel corpora, days to train, and hours to decode. This article introduces SelfSeg, a self-supervised neural sub-word segmentation method that is much faster to train/decode and requires only monolingual dictionaries instead of parallel corpora. SelfSeg takes as input a word in the form of a partially masked character sequence, optimizes the word generation probability, and generates the segmentation with the maximum posterior probability, which is calculated using a dynamic programming algorithm. The training time of SelfSeg depends on word frequencies, and we explore several word frequency normalization strategies to accelerate the training phase. Additionally, we propose a regularization mechanism that allows the segmenter to generate various segmentations for one word. To show the effectiveness of our approach, we conduct MT experiments in low-, middle-, and high-resource scenarios, where we compare the performance of using different segmentation methods. The experimental results demonstrate that, on the low-resource ALT dataset, our method achieves more than 1.2 BLEU score improvement compared with BPE and SentencePiece, and a 1.1 score improvement over Dynamic Programming Encoding (DPE) and Vocabulary Learning via Optimal Transport (VOLT), on average. The regularization method achieves approximately a 4.3 BLEU score improvement over BPE and a 1.2 BLEU score improvement over BPE-dropout, the regularized version of BPE. We also observed significant improvements on IWSLT15 Vi→En, WMT16 Ro→En, and WMT15 Fi→En datasets and competitive results on the WMT14 De→En and WMT14 Fr→En datasets. Furthermore, our method is 17.8× faster during training and up to 36.8× faster during decoding in a high-resource scenario compared to DPE. We provide extensive analysis, including why monolingual word-level data is enough to train SelfSeg
ScrumSourcing: Challenges of Collaborative Post-editing for Rugby World Cup 2019
This paper describes challenges facing the ScrumSourcing project to create a neural machine translation (NMT) service aiding interaction between Japanese- and English-speaking fans during Rugby World Cup 2019 in Japan. This is an example of «domain adaptation». The best training data for adapting NMT is large volumes of translated sentences typical of the domain. In reality, however, such parallel data for rugby does not exist. The problem is compounded by a marked asymmetry between the two languages in conventions for post-match reports; and the almost total absence of in-match commentaries in Japanese. In post-editing the NMT output to incrementally improve quality via retraining, volunteer rugby fans will play a crucial role in determining a new genre in Japanese. To avoid de-motivating the volunteers at the outset we undertake an initial adaptation of the system using terminological data. This paper describes the compilation of this data and its effects on the quality of the systems’ output.Este documento describe los retos a los que se enfrenta el proyecto ScrumSourcing para crear un servicio de traducción automática neuronal (NMT) que ayude a la interacción entre los aficionados de habla japonesa e inglesa durante la Copa Mundial de Rugby de 2019 en Japón. Este es un ejemplo de «adaptación al dominio». Los mejores datos de entrenamiento para adaptar la NMT son grandes volúmenes de oraciones traducidas típicas del dominio. Sin embargo, en la realidad no existen tales datos paralelos para el rugby. El problema se agrava por una marcada asimetría entre las dos lenguas en las convenciones para los informes posteriores al partido y la ausencia casi total de comentarios emitidos en directo durante el partido en japonés. En la post-edición de la producción de la NMT para mejorar de forma incremental la calidad a través del reentrenamiento, los voluntarios aficionados al rugby desempeñarán un papel crucial en la determinación de un nuevo género en japonés. Para evitar desmotivar a los voluntarios desde el principio, emprenderemos una adaptación inicial del sistema utilizando datos terminológicos. Este documento describe la compilación de estos datos y sus efectos en la calidad de la producción de los sistemas
- …