64 research outputs found

    “You’re trolling because
” – A Corpus-based Study of Perceived Trolling and Motive Attribution in the Comment Threads of Three British Political Blogs

    Get PDF
    This paper investigates the linguistically marked motives that participants attribute to those they call trolls in 991 comment threads of three British political blogs. The study is concerned with how these motives affect the discursive construction of trolling and trolls. Another goal of the paper is to examine whether the mainly emotional motives ascribed to trolls in the academic literature correspond with those that the participants attribute to the alleged trolls in the analysed threads. The paper identifies five broad motives ascribed to trolls: emotional/mental health-related/social reasons, financial gain, political beliefs, being employed by a political body, and unspecified political affiliation. It also points out that depending on these motives, trolling and trolls are constructed in various ways. Finally, the study argues that participants attribute motives to trolls not only to explain their behaviour but also to insult them

    Robust part-of-speech tagging of social media text

    Get PDF
    Part-of-Speech (PoS) tagging (Wortklassenerkennung) ist ein wichtiger Verarbeitungsschritt in vielen sprachverarbeitenden Anwendungen. Heute gibt es daher viele PoS Tagger, die diese wichtige Aufgabe automatisiert erledigen. Es hat sich gezeigt, dass PoS tagging auf informellen Texten oft nur mit unzureichender Genauigkeit möglich ist. Insbesondere Texte aus sozialen Medien sind eine große Herausforderung. Die erhöhte Fehlerrate, welche auf mangelnde Robustheit zurĂŒckgefĂŒhrt werden kann, hat schwere Folgen fĂŒr Anwendungen die auf PoS Informationen angewiesen sind. Diese Arbeit untersucht daher Tagger-Robustheit unter den drei Gesichtspunkten der (i) DomĂ€nenrobustheit, (ii) Sprachrobustheit und (iii) Robustheit gegenĂŒber seltenen linguistischen PhĂ€nomene. FĂŒr (i) beginnen wir mit einer Analyse der PhĂ€nomene, die in informellen Texten hĂ€ufig anzutreffen sind, aber in formalen Texten nur selten bis gar keine Verwendung finden. Damit schaffen wir einen Überblick ĂŒber die Art der PhĂ€nomene die das Tagging von informellen Texten so schwierig machen. Wir evaluieren viele der ĂŒblicherweise benutzen Tagger fĂŒr die englische und deutsche Sprache auf Texten aus verschiedenen DomĂ€nen, um einen umfassenden Überblick ĂŒber die derzeitige Robustheit der verfĂŒgbaren Tagger zu bieten. Die Untersuchung ergab im Wesentlichen, dass alle Tagger auf informellen Texten große SchwĂ€chen zeigen. Methoden, um die Robustheit fĂŒr domĂ€nenĂŒbergreifendes Tagging zu verbessern, sind prinzipiell hilfreich, lösen aber das grundlegende Robustheitsproblem nicht. Als neuen Lösungsansatz stellen wir Tagging in zwei Schritten vor, welches eine erhöhte Robustheit gegenĂŒber domĂ€nenĂŒbergreifenden Tagging bietet. Im ersten Schritt wird nur grob-granular getaggt und im zweiten Schritt wird dieses Tagging dann auf das fein-granulare Level verfeinert. FĂŒr (ii) untersuchen wir Sprachrobustheit und ob jede Sprache einen zugeschnittenen Tagger benötigt, oder ob es möglich ist einen sprach-unabhĂ€ngigen Tagger zu konstruieren, der fĂŒr mehrere Sprachen funktioniert. Dazu vergleichen wir Tagger basierend auf verschiedenen Algorithmen auf 21 Sprachen und analysieren die notwendigen technischen Eigenschaften fĂŒr einen Tagger, der auf mehreren Sprachen akkurate Modelle lernen kann. Die Untersuchung ergibt, dass Sprachrobustheit an fĂŒr sich kein schwerwiegendes Problem ist und, dass die TagsetgrĂ¶ĂŸe des Trainingskorpus ein wesentlich stĂ€rkerer Einflussfaktor fĂŒr die Eignung eines Taggers ist als die Zugehörigkeit zu einer gewissen Sprache. BezĂŒglich (iii) untersuchen wir, wie man mit seltenen PhĂ€nomenen umgehen kann, fĂŒr die nicht genug Trainingsdaten verfĂŒgbar sind. Dazu stellen wir eine neue kostengĂŒnstige Methode vor, die nur einen minimalen Aufwand an manueller Annotation erwartet, um zusĂ€tzliche Daten fĂŒr solche seltenen PhĂ€nomene zu produzieren. Ein Feldversuch hat gezeigt, dass die produzierten Daten ausreichen um das Tagging von seltenen PhĂ€nomenen deutlich zu verbessern. Abschließend prĂ€sentieren wir zwei Software-Werkzeuge, FlexTag und DeepTC, die wir im Rahmen dieser Arbeit entwickelt haben. Diese Werkzeuge bieten die notwendige FlexibilitĂ€t und Reproduzierbarkeit fĂŒr die Experimente in dieser Arbeit.Part-of-speech (PoS) taggers are an important processing component in many Natural Language Processing (NLP) applications, which led to a variety of taggers for tackling this task. Recent work in this field showed that tagging accuracy on informal text domains is poor in comparison to formal text domains. In particular, social media text, which is inherently different from formal standard text, leads to a drastically increased error rate. These arising challenges originate in a lack of robustness of taggers towards domain transfers. This increased error rate has an impact on NLP applications that depend on PoS information. The main contribution of this thesis is the exploration of the concept of robustness under the following three aspects: (i) domain robustness, (ii) language robustness and (iii) long tail robustness. Regarding (i), we start with an analysis of the phenomena found in informal text that make tagging this kind of text challenging. Furthermore, we conduct a comprehensive robustness comparison of many commonly used taggers for English and German by evaluating them on the text of several text domains. We find that the tagging of informal text is poorly supported by available taggers. A review and analysis of currently used methods to adapt taggers to informal text showed that these methods improve tagging accuracy but offer no satisfactory solution. We propose an alternative tagging approach that reaches an increased multi-domain tagging robustness. This approach is based on tagging in two steps. The first step tags on a coarse-grained level and the second step refines the tags to the fine-grained tags. Regarding (ii), we investigate whether each language requires a language-tailored PoS tagger or if the construction of a competitive language independent tagger is feasible. We explore the technical details that contribute to a tagger's language robustness by comparing taggers based on different algorithms to learn models of 21 languages. We find that language robustness is a less severe issue and that the impact of the tagger choice depends more on the granularity of the tagset that shall be learned than on the language. Regarding (iii), we investigate methods to improve tagging of infrequent phenomena of which no sufficient amount of annotated training data is available, which is a common challenge in the social media domain. We propose a new method to overcome this lack of data that offers an inexpensive way of producing more training data. In a field study, we show that the quality of the produced data suffices to train tagger models that can recognize these under-represented phenomena. Furthermore, we present two software tools, FlexTag and DeepTC, which we developed in the course of this thesis. These tools provide the necessary flexibility for conducting all the experiments in this thesis and ensure their reproducibility

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Influence of the pandemic lockdown on Fridays for Future’s hashtag activism

    Get PDF
    Social movement organizations (SMOs) increasingly rely on Twitter to create new and viral communication spaces alongside newsworthy protest events and communicate their grievance directly to the public. When the COVID-19 pandemic impeded street protests in spring 2020, SMOs had to adapt their strategies to online-only formats. We analyze the German-language Twitter communication of the climate movement Fridays for Future (FFF) before and during the lockdown to explain how SMOs adapted their strategy under online-only conditions. We collected (re-)tweets containing the hashtag #fridaysforfuture (N = 46,881 tweets, N = 225,562 retweets) and analyzed Twitter activity, use of hashtags, and predominant topics. Results show that although the number of tweets was already steadily declining before, it sharply dropped during the lockdown. Moreover, the use of hashtags changed substantially and tweets focused increasingly on thematic discourses and debates around the legitimacy of FFF, while tweets about protests and calls for mobilization decreased

    Lexical innovation on the web and social media

    Get PDF
    This dissertation investigates the emergence and diffusion of English neologisms on the web and social media, employing a data-driven methodology to identify a substantial sample of 851 neologisms. Neologisms are examined from their coining to successful dissemination within the community, with the study revealing a wide spectrum of degrees of diffusion. The exploration extends to studying the usage and diffusion of selected neologisms on the web and on Twitter, with a particular focus on social dynamics and variation among different speaker groups. Moreover, the dissertation probes into semantic innovation, demonstrating substantial socio-semantic variation and polarized public discourse surrounding certain neologisms. The research conducts an extensive analysis of semantic innovation and socio-semantic variation, elucidating significant socio-semantic discrepancies between various communities. The dissertation sheds light on the social and semantic dynamics underpinning the life cycle of neologisms within a linguistically diverse community
    • 

    corecore