Search CORE

6,874 research outputs found

HanoiT: Enhancing Context-aware Translation via Selective Context

Author: Guo Hongcheng
Huang Haoyang
Li Zhoujun
Ma Shuming
Wei Furu
Yang Jian
Yang Liqun
Yin Yuwei
Zeng Yutao
Zhang Dongdong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/01/2023
Field of study

Context-aware neural machine translation aims to use the document-level context to improve translation quality. However, not all words in the context are helpful. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. To mitigate this problem, we propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context. To verify the effectiveness of our method, extensive experiments and extra quantitative analysis are conducted on four document-level machine translation benchmarks. The experimental results demonstrate that our model significantly outperforms previous models on all datasets via the soft selection mechanism

arXiv.org e-Print Archive

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Author: Cai Deng
Cui Leyang
Du Zefeng
Jiang Haiyun
Liu Donghuai
Shi Shuming
Tu Zhaopeng
Wang Longyue
Wang Yan
Yu Dian
Publication venue
Publication date: 21/07/2023
Field of study

Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench.Comment: Zhaopeng Tu is the corresponding autho

arXiv.org e-Print Archive