Journal for Language Technology and Computational Linguistics (JLCL)
Not a member yet
    255 research outputs found

    AI Explainability in Classifying Political Speeches and Interviews

    Get PDF
    This study applies explainable AI techniques to understand the linguistic features involved in classifying speeches and interviews in political discourse, a field where transparency is sensitive. Using a feature-based Linguistic-Rule-Based Model (LRBM), logistic regression, Transformerbased models, and SHAP values, we create a more interpretable version of the predictions made by BERT models in this natural language processing (NLP) binary classification task. The study explores the role that recognizable linguistic features play in both feature-based and neural models. Specifically, it examines the extent to which BERT models depend on linguistic structures for their predictions, using NER anonymization to reduce reliance on thematic context. Built on findings from classic and modern linguistic literature, and in addition to improving the interpretability of neural models, the study highlights the identification of important global “political discourse features” that distinguish speeches and interviews: nominalization frequency, discourse marker frequency, personal pronoun frequency, and interjection frequency

    Discourse Segmentation of German Text with Pretrained Language Models

    Get PDF
    Segmenting text into so-called "elementary discourse units" (EDUs) is a task that is relevant for several NLP applications, including discourse parsing or argument mining. In recent years, EDU segmentation has been addressed as part of a shared task on multilingual discourse parsing ("DISRPT"), where BERT-based encoder models proved particularly successful. The German language has been represented in DISRPT with the Potsdam Commentary Corpus, but recently, more German data with EDU segmentation has been published. In this paper, we conduct detailed tests on the German-language datasets that are currently available. We test a multilingual off-the-shelf model, several BERT-based encoders, and the current generation of LLMs. The results are analyzed both qualitatively and quantitatively and are compared to the multilingual state-of-the-art. We are making the best-performing model available as a tool that can be used by the community

    Do LLMs fail in bridging generation?

    Get PDF
    In this work we investigate whether large language models (LLMs) ‘understand’ bridging relations and can use this knowledge effectively. We present the results obtained from two tasks: generation of texts containing bridging and filling in missing bridging spans. We show that in most of the cases LLMs fail to generate bridging in a reliable way

    Exploring the Limits of LLMs for German Text Classification: Prompting and Fine-tuning Strategies Across Small and Medium-sized Datasets

    Get PDF
    Large Language Models (LLMs) are highly capable, state-of-the-art technologies and widely used as text classifiers for various NLP tasks, including sentiment analysis, topic classification, legal document analysis, etc. In this paper, we present a systematic analysis of the performance of LLMs as text classifiers using five German datasets from social media across 13 different tasks. We investigate zero- (ZSC) and few-shot classification (FSC) approaches with multiple LLMs and provide a comparative analysis with fine-tuned models based on Llama-3.2, EuroLLM, Teuken and BübleLM. We concentrate on investigating the limits of LLMs and on accurately describing our findings and overall challenges

    Can we Operationalize Conceptual Metaphor Cross-Lingually?

    Get PDF
    The conceptual nature of metaphorical expression is a long-discussed phenomenon, highly investigated by linguists, psychologists, translators, and philosophers, amongst others. In theoretical work, distinctions are made between conceptual metaphors (a phenomenon of human cognition) and linguistic metaphors (their concrete realizations in language), while most computational approaches have only addressed the latter. In the age of massive language models, metaphor and other phenomena of figurative speech are earning new attention as more and more textual analyses are built on top of neural-networking tools that do not necessarily make a distinction between the lexicalization of a concept and the concept itself. Hence, an investigation of conceptual metaphor using a more linguistics-driven perspective is of much importance. In this work, we investigate the conceptuality of metaphoric expressions across two languages utilizing a parallel corpus of news commentaries from the web. We assume that a conceptual metaphor is represented by many instances of linguistic metaphors. This idea presupposes linguistic metaphor as an operationalization of conceptual metaphor. We perform several tests on how metaphors are translated between the languages, to assess whether distinct lexicalizations of a metaphor form conceptual clusters, and whether the usage of words in a metaphorical context is distinguishable from their usage in literal contexts. We find that we are able to group linguistic metaphors in one language into semantically related sets by clustering their translations in another language. We argue that these semantically related sets constitute an operationalization of conceptual metaphors. In English, the clusters are formed by fewer, but more diverse lexelts (linguistic types), while in German we find more and bigger clusters composed primarily of derivatives and compounds. We also find that when a lexelt is translated similarly in unannotated instances to known metaphoric usages, then its contextual sense tends to be figurative as well.

    GPT makes a poor AMR parser

    Get PDF
    This paper evaluates GPT models as out-of-the-box Abstract Meaning Representation (AMR) parsers using prompt-based strategies, including 0-shot, few-shot, Chain-of-Thought (CoT), and a two-step approach in which core arguments and non-core roles are handled separately. Our results show that GPT-3.5 and GPT-4o fall well short of state-of-the-art parsers, with a maximum Smatch score of 60 using GPT-4o in a 5-shot setting. While CoT prompting provides some interpretability, it does not improve performance. We further conduct fine-grained evaluations, revealing GPT’s limited ability to handle AMR-specific linguistic structures and complex semantic roles. Ourfindings suggest that, despite recent advances, GPT models are not yet suitable as standalone AMR parsers

    Editorial

    Get PDF

    A Study of Errors in the Output of Large Language Models for Domain-Specific Few-Shot Named Entity Recognition

    Get PDF
    This paper proposes an error classification framework for a comprehensive analysis of the output that large language models (LLMs) generate in a few-shot named entity recognition (NER) task in a specialised domain. The framework should be seen as an exploratory analysis complementary to established performance metrics for NER classifiers, such as F1 score, as it accounts for outcomes possible in a few-shot, LLMbased NER task. By categorising and assessing incorrect named entity predictions quantitatively, the paper shows how the proposed error classification could support a deeper cross-model and cross-prompt performance comparison, alongside a roadmap for a guided qualitative error analysis

    The Struggles of Large Language Models with Zero- and Few-Shot (Extended) Metaphor Detection

    Get PDF
    Extended metaphor is the use of multiple metaphoric words that express the same domain mapping. Although it would provide valuable insight for computational metaphor processing, detecting extended metaphor has been rather neglected. We fill this gap by providing a series of zero- and few-shot experiments on the detection of all linguistic metaphors and specifically on extended metaphors with LLaMa and GPT models. We find that no model was able to achieve satisfactory performance on either task, and that LLaMa in particular showed problematic overgeneralization tendencies. Moreover, our error analysis showed that LLaMa is not sufficiently able to construct the domain mappings relevant for metaphor understanding

    Post hoc implementation of non-standard phonetic features in the context of aphasic speech analysis

    No full text
    Despite current progress, automatic speech recognition (ASR) often struggles with non-standard speech, for example, influenced by dialectal or pathological features. (Re)training ASR models to accommodate these variations is not always possible due to limited data. This paper proposes applying the knowledge about non-standard (aphasic and dialectal) phonetic features to the ASR transcription post hoc. Using speech data from German speakers with aphasia who speak the Thuringian-Upper Saxon dialect, this study evaluates the impact of these modifications on an ASR-based error analysis pipeline. The approach helps to reduce automatic error rates on the recordings manually labelled as error-free. The performance of the pipeline also improves both in general acceptance or rejection of the responses and error attribution. General acceptance/rejection accuracy reaches the mean of 83.3%, which is considered sufficient to be used in a digital application for speech and language therapy support

    233

    full texts

    255

    metadata records
    Updated in last 30 days.
    Journal for Language Technology and Computational Linguistics (JLCL) is based in Germany
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇