Search CORE

64 research outputs found

Identifying Semantic Divergences in Parallel Text without Annotations

Author: Carpuat Marine
Niu Xing
Vyas Yogarshi
Publication venue
Publication date: 01/01/2018
Field of study

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.Comment: Accepted as a full paper to NAACL 201

arXiv.org e-Print Archive

Crossref

Text Style Transfer Back-Translation

Author: Chen Xiaoyu
Guo Jiaxin
Li Zongyao
Shang Hengchao
Wang Minghan
Wei Daimeng
Wu Zhanglin
Yang Hao
Yu Zhengzhe
Publication venue
Publication date: 02/06/2023
Field of study

Back Translation (BT) is widely used in the field of machine translation, as it has been proved effective for enhancing translation quality. However, BT mainly improves the translation of inputs that share a similar style (to be more specific, translation-like inputs), since the source side of BT data is machine-translated. For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer model to modify the source side of BT data. By making the style of source-side text more natural, we aim to improve the translation of natural inputs. Our experiments on various language pairs, including both high-resource and low-resource ones, demonstrate that TST BT significantly improves translation performance against popular BT benchmarks. In addition, TST BT is proved to be effective in domain adaptation so this strategy can be regarded as a general data augmentation method. Our training code and text style transfer model are open-sourced.Comment: acl2023, 14 pages, 4 figures, 19 table

arXiv.org e-Print Archive

The University of Edinburgh’s English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task

Author: Bawden Rachel
Birch Alexandra
Dobreva Radina
Miceli Barone Antonio Valerio
Oncevay Marcos Arturo
Williams Philip
Publication venue
Publication date: 19/11/2020
Field of study

Edinburgh Research Explorer