104,898 research outputs found
Corpus Augmentation by Sentence Segmentation for Low-Resource Neural Machine Translation
Neural Machine Translation (NMT) has been proven to achieve impressive
results. The NMT system translation results depend strongly on the size and
quality of parallel corpora. Nevertheless, for many language pairs, no
rich-resource parallel corpora exist. As described in this paper, we propose a
corpus augmentation method by segmenting long sentences in a corpus using
back-translation and generating pseudo-parallel sentence pairs. The experiment
results of the Japanese-Chinese and Chinese-Japanese translation with
Japanese-Chinese scientific paper excerpt corpus (ASPEC-JC) show that the
method improves translation performance.Comment: 4 pages. The version before Applied. Science
Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts
Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., German–English). In this paper, we are investigating the performance of neural machine translation in Chinese–Spanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.Peer ReviewedPostprint (published version
Neural machine translation using bitmap fonts
Recently, translation systems based on neural networks are starting to compete with systems based on phrases. The systems which are based on neural networks use vectorial repre- sentations of words. However, one of the biggest challenges that machine translation still faces, is dealing with large vocabularies and morphologically rich languages. This work aims to adapt a neural machine translation system to translate from Chinese to Spanish, using as input different types of granularity: words, characters, bitmap fonts of Chinese characters or words. The fact of performing the interpretation of every character or word as a bitmap font allows for obtaining more informed vectorial representations. Best results are obtained when using the information of the word bitmap font.Postprint (published version
Neural System Combination for Machine Translation
Neural machine translation (NMT) becomes a new approach to machine
translation and generates much more fluent results compared to statistical
machine translation (SMT).
However, SMT is usually better than NMT in translation adequacy. It is
therefore a promising direction to combine the advantages of both NMT and SMT.
In this paper, we propose a neural system combination framework leveraging
multi-source NMT, which takes as input the outputs of NMT and SMT systems and
produces the final translation.
Extensive experiments on the Chinese-to-English translation task show that
our model archives significant improvement by 5.3 BLEU points over the best
single system output and 3.4 BLEU points over the state-of-the-art traditional
system combination methods.Comment: Accepted as a short paper by ACL-201
A Client mobile application for Chinese-Spanish statistical machine translation
This show and tell paper describes a client mobile application for Chinese-Spanish machine translation. The system combines a standard server-based statistical machine translation (SMT) system, which requires online operation, with different input modalities including text, optical character recognition (OCR) and automatic speech recognition (ASR). It also includes an index-based search engine for supporting off-line translation.Postprint (published version
Supervised Attentions for Neural Machine Translation
In this paper, we improve the attention or alignment accuracy of neural
machine translation by utilizing the alignments of training sentence pairs. We
simply compute the distance between the machine attentions and the "true"
alignments, and minimize this cost in the training procedure. Our experiments
on large-scale Chinese-to-English task show that our model improves both
translation and alignment qualities significantly over the large-vocabulary
neural machine translation system, and even beats a state-of-the-art
traditional syntax-based system.Comment: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap
with arXiv:1605.0314
- …