Discourse Segmentation of German Text with Pretrained Language Models

Frenzel, Steffen; Krupop, Maximilian; Stede, Manfred

Search results>Research output from Journal for Language Technology and Computational Linguistics (JLCL)

research article

oai:jlcl.org:article/306

Discourse Segmentation of German Text with Pretrained Language Models

Authors: Steffen Frenzel
Maximilian Krupop
Manfred Stede
Publication date: 2 February 2026
Publisher: German Society for Computational Linguistics and Language Technology (GSCL)
Doi

Abstract

Segmenting text into so-called "elementary discourse units" (EDUs) is a task that is relevant for several NLP applications, including discourse parsing or argument mining. In recent years, EDU segmentation has been addressed as part of a shared task on multilingual discourse parsing ("DISRPT"), where BERT-based encoder models proved particularly successful. The German language has been represented in DISRPT with the Potsdam Commentary Corpus, but recently, more German data with EDU segmentation has been published. In this paper, we conduct detailed tests on the German-language datasets that are currently available. We test a multilingual off-the-shelf model, several BERT-based encoders, and the current generation of LLMs. The results are analyzed both qualitatively and quantitatively and are compared to the multilingual state-of-the-art. We are making the best-performing model available as a tool that can be used by the community

Similar works

Full text

Open in the Core reader

Download PDF

Journal for Language Technology and Computational Linguistics (JLCL)

oai:jlcl.org:article/306

Last time updated on 02/05/2026

This paper was published in Journal for Language Technology and Computational Linguistics (JLCL).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by-sa/4.0