8 research outputs found

    Rassistische Sprache mit BERT erkennen - Eine Untersuchung am Beispiel deutscher Plenarprotokolle

    Get PDF
    Immer wieder kommt es vor, dass in Plenardebatten des Deutschen Bundestages rassistische Sprache verwendet wird. Gerade vor dem Hintergrund der Black-Lives-Matter-Demonstrationen, des rechtsextremistischen Terroranschlags von Hanau und der islamistischen Terroranschläge in Frankreich und Deutschland im turbulenten Jahr 2020 zeigt sich daher verstärkt die Notwendigkeit einer Auseinandersetzung mit rassistischer politischer Sprache. Plenarsitzungen sind meist sehr lang und unübersichtlich. Kaum jemand verfolgt alle Debatten und Reden. Diese zu überblicken und rassistische Sprache zeitnah zu identifizieren und zu kritisieren, erscheint in Anbetracht der großen Menge an Textdaten in Plenarprotokollen geradezu unmöglich. Es benötigt dementsprechend ein Tool, das den Text in Plenarprotokollen verarbeitet, versteht und automatisch rassistische Sprache erkennt. Eine Möglichkeit für ein solches Tool birgt das Transformer-basierte BERT. Es stellt derzeit den State-of-the-Art im NLP dar. In dieser Arbeit soll evaluiert werden, ob und wie BERT für eine erfolgreiche binäre Textklassifikation zur Identifikation von rassistischer Sprache in Plenarprotokollen eingesetzt werden kann. Dazu erfolgt zunächst eine Auseinandersetzung mit Rassismus und rassistischer politischer Sprache, um jeweils Arbeitsdefinitionen entwickeln zu können. Nach einer Vertiefung in die theoretischen Grundlagen neuronaler Netze über verschiedene Netzarchitekturen wie RNN, LSTM und Transformer hinweg; wird näher auf die Funktionsweisen von BERT eingegangen. Im praktischen Teil der Arbeit werden schließlich auf Basis der festgelegten Arbeitsdefinitionen von Rassismus und rassistischer Sprache zwei möglichst differenzierte Textkorpora erstellt. Mit diesen Korpora werden fünf Experimente durchgeführt, die Aufschluss über die Forschungsfragen geben sollen. Die Resultate zeigen, dass durchaus Potential für ein BERT-Model besteht, das rassistische Sprache in deutschen Plenarprotokollen identifiziert. Dennoch gibt es noch viele Möglichkeiten das Model zu verbessern. Diese sollten vor einem tatsächlichen Einsatz in der Politik auch genutzt werden

    Exploring Syntactic Representations in Pre-trained Transformers to Improve Neural Machine Translation by a Fusion of Neural Network Architectures

    Get PDF
    Neural networks in Machine Translation (MT) engines may not consider deep linguistic knowledge, often resulting in low-quality translations. In order to improve translation quality, this study examines the feasibility of fusing two data augmentation strategies: the explicit syntactic knowledge incorporation and the pre-trained language model BERT. The study first investigates what BERT knows about syntactic knowledge of the source language sentences before and after MT fine-tuning through syntactic probing experiments, as well as using a Quality Estimation (QE) model and the chi-square test to clarify the correlation between syntactic knowledge of the source language sentences and the quality of translations in the target language. The experimental results show that BERT can explicitly predict different types of dependency relations in source language sentences and exhibit different learning trends, which probes can reveal. Moreover, experiments confirm a correlation between dependency relations in source language sentences and translation quality in MT scenarios, which can somewhat influence translation quality. The dependency relations of the source language sentences frequently appear in low-quality translations are detected. Probes can be linked to those dependency relations, where prediction scores of dependency relations tend to be higher in the middle layer of BERT than those in the top layer. The study then presents dependency relation prediction experiments to examine what a Graph Attention Network (GAT) learns syntactic dependencies and investigates how it learns such knowledge by different pairs of the number of attention heads and model layers. Additionally, the study examines the potential of incorporating GAT-based syntactic predictions in MT scenarios by comparing GAT with fine-tuned BERT in dependency relations prediction. Based on the paired t-test and prediction scores, GAT outperforms MT-B, a version of BERT specifically fine-tuned for MT. GAT exhibits higher prediction scores for the majority of dependency relations. For some dependency relations, it even outperforms UD-B, a version of BERT specifically fine-tuned for syntactic dependencies. However, GAT faces difficulties in predicting accurately by the quantity and subtype of dependency relations, which can lead to lower prediction scores. Finally, the study proposes a novel MT architecture of Syntactic knowledge via Graph attention with BERT (SGB) engines and examines how the translation quality changes from various perspectives. The experimental results indicate that the SGB engines can improve low-quality translations across different source language sentence lengths and better recognize the syntactic structure defined by dependency relations of source language sentences based on the QE scores. However, improving translation quality relies on BERT correctly modeling the source language sentences. Otherwise, the syntactic knowledge on the graphs is of limited impact. The prediction scores of GAT for dependency relations can also be linked to improved translation quality. GAT allows some layers of BERT to reconsider the syntactic structures of the source language sentences. Using XLM-R instead of BERT still results in improved translation quality, indicating the efficiency of syntactic knowledge on graphs. These experiments not only show the effectiveness of the proposed strategies but also provide explanations, which bring more inspiration for future fusion that graph neural network modeling linguistic knowledge and pre-trained language models in MT scenarios

    Great expectations: unsupervised inference of suspense, surprise and salience in storytelling

    Get PDF
    Stories interest us not because they are a sequence of mundane and predictable events but because they have drama and tension. Crucial to creating dramatic and exciting stories are surprise and suspense. Likewise, certain events are key to the plot and more important than others. Importance is referred to as salience. Inferring suspense, surprise and salience are highly challenging for computational systems. It is difficult because all these elements require a strong comprehension of the characters and their motivations, places, changes over time, and the cause/effect of complex interactions. Recently advances in machine learning (often called deep learning) have substantially improved in many language-related tasks, including story comprehension and story writing. Most of these systems rely on supervision; that is, huge numbers of people need to tag large quantities of data to tell the system what to teach these systems. An example would be tagging which events are suspenseful. It is highly inflexible and costly. Instead, the thesis trains a series of deep learning models via only reading stories, a self-supervised (or unsupervised) system. Narrative theory methods (rules and procedures) are applied to the knowledge built into the deep learning models to directly infer salience, surprise, and salience in stories. Extensions add memory and external knowledge from story plots and from Wikipedia to infer salience on novels such as Great Expectations and plays such as Macbeth. Other work adapts the models as a planning system for generating new stories. The thesis finds that applying the narrative theory to deep learning models can align with the typical reader. In follow up work, the insights could help improve computer models for tasks such as automatic story writing, assistance for writing, summarising or editing stories. Moreover, the approach of applying narrative theory to the inherent qualities built in a system that learns itself (self-supervised) from reading from books, watching videos, listening to audio is much cheaper and more adaptable to other domains and tasks. Progress is swift in improving self-supervised systems. As such, the thesis's relevance is that applying domain expertise with these systems may be a more productive approach in many areas of interest for applying machine learning
    corecore