Search CORE

2 research outputs found

NCLS: Neural Cross-Lingual Summarization

Author: Wang Qian
Wang Shaonan
Wang Yining
Zhang Jiajun
Zhou Yu
Zhu Junnan
Zong Chengqing
Publication venue
Publication date: 31/08/2019
Field of study

Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. Existing methods simply divide this task into two steps: summarization and translation, leading to the problem of error propagation. To handle that, we present an end-to-end CLS framework, which we refer to as Neural Cross-Lingual Summarization (NCLS), for the first time. Moreover, we propose to further improve NCLS by incorporating two related tasks, monolingual summarization and machine translation, into the training process of CLS under multi-task learning. Due to the lack of supervised CLS data, we propose a round-trip translation strategy to acquire two high-quality large-scale CLS datasets based on existing monolingual summarization datasets. Experimental results have shown that our NCLS achieves remarkable improvement over traditional pipeline methods on both English-to-Chinese and Chinese-to-English CLS human-corrected test sets. In addition, NCLS with multi-task learning can further significantly improve the quality of generated summaries. We make our dataset and code publicly available here: http://www.nlpr.ia.ac.cn/cip/dataset.htm.Comment: Accepted to EMNLP-IJCNLP 201

arXiv.org e-Print Archive

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

Author: Cardie Claire
Durmus Esin
Ladhak Faisal
McKeown Kathleen
Publication venue
Publication date: 06/10/2020
Field of study

We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems. We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors. We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We further propose a method for direct crosslingual summarization (i.e., without requiring translation at inference time) by leveraging synthetic data and Neural Machine Translation as a pre-training step. Our method significantly outperforms the baseline approaches, while being more cost efficient during inference.Comment: Findings of EMNLP 202

arXiv.org e-Print Archive