Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
  Modelling

Cai, Deng; Cui, Leyang; Du, Zefeng; Jiang, Haiyun; Liu, Donghuai; Shi, Shuming; Tu, Zhaopeng; Wang, Longyue; Wang, Yan; Yu, Dian

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Authors: Deng Cai
Leyang Cui
Zefeng Du
Haiyun Jiang
Donghuai Liu
Shuming Shi
Zhaopeng Tu
Longyue Wang
Yan Wang
Dian Yu
Publication date: 21 July 2023
Publisher

Abstract

Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench.Comment: Zhaopeng Tu is the corresponding autho

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.08074

Last time updated on 20/07/2023