33 research outputs found
Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora
The Chinese language has evolved a lot during the long-term development.
Therefore, native speakers now have trouble in reading sentences written in
ancient Chinese. In this paper, we propose to build an end-to-end neural model
to automatically translate between ancient and contemporary Chinese. However,
the existing ancient-contemporary Chinese parallel corpora are not aligned at
the sentence level and sentence-aligned corpora are limited, which makes it
difficult to train the model. To build the sentence level parallel training
data for the model, we propose an unsupervised algorithm that constructs
sentence-aligned ancient-contemporary pairs by using the fact that the aligned
sentence pair shares many of the tokens. Based on the aligned corpus, we
propose an end-to-end neural model with copying mechanism and local attention
to translate between ancient and contemporary Chinese. Experiments show that
the proposed unsupervised algorithm achieves 99.4% F1 score for sentence
alignment, and the translation model achieves 26.95 BLEU from ancient to
contemporary, and 36.34 BLEU from contemporary to ancient.Comment: Acceptted by NLPCC 201
Application of Neural Machine Translation with Attention Mechanism for Translation of Indonesian to Seram Language (Geser)
The Seram language (Geser) is one of the regional languages in Kabupaten Seram Bagian Timur of Maluku Province which has been classified by the Language Office as an endangered language. This study uses the Neural Machine Translation (NMT) method in an effort to preserve the Seram (Geser) language. The NMT method has proven to be effective compared to SMT in overcoming the challenges of language translation by using the attention mechanism to improve translation accuracy. The data used in this study were obtained through interviews of 3538 parallel corpus, 255 Indonesian vocabularies and 269 Seram (Geser) vocabularies. The result showed that using 708 test data without Out-of Vocabulary (OOV) the BLUE Score was 0.90518992895191 or 90.518%