The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of
Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals
were originally written in an archaic Korean writing system, `Hanja', and were
translated into Korean from 1968 to 1993. The resulting translation was however
too literal and contained many archaic Korean words; thus, a new expert
translation effort began in 2012. Since then, the records of only one king have
been completed in a decade. In parallel, expert translators are working on
English translation, also at a slow pace and produced only one king's records
in English so far. Thus, we propose H2KE, a neural machine translation model,
that translates historical documents in Hanja to more easily understandable
Korean and to English. Built on top of multilingual neural machine translation,
H2KE learns to translate a historical document written in Hanja, from both a
full dataset of outdated Korean translation and a small dataset of more
recently translated contemporary Korean and English. We compare our method
against two baselines: a recent model that simultaneously learns to restore and
translate Hanja historical document and a Transformer based model trained only
on newly translated corpora. The experiments reveal that our method
significantly outperforms the baselines in terms of BLEU scores for both
contemporary Korean and English translations. We further conduct extensive
human evaluation which shows that our translation is preferred over the
original expert translations by both experts and non-expert Korean speakers.Comment: 2022 EMNLP Finding