20 Minuten: A Multi-task News Summarisation Dataset for German

Abstract

Automatic text summarisation (ATS) is a central task in natural language processing that aims to reduce a long document into a shorter, concise summary that conveys its key points. Extractive approaches to ATS, which identify and copy the most important sentences or phrases from the original text, have long been a popular choice, but these summaries suffer from being incohesive and disjointed. More recently, abstractive approaches to ATS have gained popularity thanks to advancements in neural text generation. Yet, much of the research on ATS has been limited to English, due to its high-resource dominance. This work introduces a new dataset for German- language news summarisation. Aside from summarisation, the dataset also allows for addressing additional NLP tasks such as image caption generation and read- ing time prediction. Furthermore, it is multi-purpose since article summaries cover a range of styles, including headlines, lead paragraphs and bullet-point summaries. In order to showcase the versatility of our dataset for different NLP tasks, we conduct experiments using mT5 [2] and compare the performance on six different tasks under single- and multi-task fine-tuning conditions, providing baselines for future work. Our findings show that dedicated models consistently perform better according to automatic metrics

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 02/08/2023