64 research outputs found
Identifying Semantic Divergences in Parallel Text without Annotations
Recognizing that even correct translations are not always semantically
equivalent, we automatically detect meaning divergences in parallel sentence
pairs with a deep neural model of bilingual semantic similarity which can be
trained for any parallel corpus without any manual annotation. We show that our
semantic model detects divergences more accurately than models based on surface
features derived from word alignments, and that these divergences matter for
neural machine translation.Comment: Accepted as a full paper to NAACL 201
Text Style Transfer Back-Translation
Back Translation (BT) is widely used in the field of machine translation, as
it has been proved effective for enhancing translation quality. However, BT
mainly improves the translation of inputs that share a similar style (to be
more specific, translation-like inputs), since the source side of BT data is
machine-translated. For natural inputs, BT brings only slight improvements and
sometimes even adverse effects. To address this issue, we propose Text Style
Transfer Back Translation (TST BT), which uses a style transfer model to modify
the source side of BT data. By making the style of source-side text more
natural, we aim to improve the translation of natural inputs. Our experiments
on various language pairs, including both high-resource and low-resource ones,
demonstrate that TST BT significantly improves translation performance against
popular BT benchmarks. In addition, TST BT is proved to be effective in domain
adaptation so this strategy can be regarded as a general data augmentation
method. Our training code and text style transfer model are open-sourced.Comment: acl2023, 14 pages, 4 figures, 19 table
- …