20 research outputs found

    Teaching Computational Methods to Humanities Students

    Get PDF
    This paper discusses the academic and societal implications of teaching computational methods to humanities students from the perspective of digital humanities. Pedagogical choices are backed up by both pedagogical theory and concrete examples from actual courses and course feedback. The aim of this paper is to introduce clear best-practice recommendations for developing digital humanities teaching with an emphasis on methods teaching in order to increase the number of students who understand such methods and can apply them to their own projects.Peer reviewe

    Sentimentator : Gamifying Fine-grained Sentiment Annotation

    Get PDF
    Peer reviewe

    XED : A Multilingual Dataset for Sentiment Analysis and Emotion Detection

    Get PDF
    We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.Peer reviewe

    LT@Helsinki at SemEval-2020 Task 12 : Multilingual or language-specific BERT?

    Get PDF
    This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID dataset. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.Peer reviewe

    Strategic sentiments and emotions in post-Second World War party manifestos in Finland

    Get PDF
    We contribute to the growing number of studies on emotions and politics by investigating how political parties strategically use sentiments and emotions in party manifestos. We use computational methods in examining changes of sentiments and emotions in Finnish party manifestos from 1945 to 2019. We use sentiment and emotion lexicons first translated from English into Finnish and then modified for the purposes of our study. We analyze how the use of emotions and sentiments differs between government and opposition parties depending on their left/right ideology and the specific type of party manifesto. In addition to traditional sentiment and emotion analysis, we use emotion intensity analysis. Our results indicate that in Finland, government and opposition parties do not differ substantially from each other in their use of emotional language. From a historical perspective, the individual emotions used in party manifestos have persisted, but changes have taken place in the intensity of using emotion words. We also find that in comparison with other parties, populist parties both appeal to different emotions and appeal to the same emotions with different intensities.Peer reviewe

    Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

    Get PDF
    This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a selfperpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and opensource and can easily be extended and applied for various purposes.Peer reviewe

    Language Change Database: a new online resource

    Get PDF
    We introduce the Language Change Database (LCD), which provides access to the results of previous corpus-based research dealing with change in the English language. The LCD will be published on an open-access linked data platform that will allow users to enter information about their own publications into the database and to conduct searches based on linguistic and extralinguistic parameters. Both metadata and numerical data from the original publications will be available for download, enabling systematic reviews, meta-analyses, replication studies and statistical modelling of language change. The LCD will be of interest to scholars, teachers and students of English.Peer reviewe

    Challenges in Annotation : Annotator Experiences from a Crowdsourced Emotion Annotation Task

    Get PDF
    With the prevalence of machine learning in natural language processing and other fields, an increasing number of crowd-sourced data sets are created and published. However, very little has been written about the annotation process from the point of view of the annotators. This pilot study aims to help fill the gap and provide insights into how to maximize the quality of the annotation output of crowd-sourced annotations with a focus on fine-grained sentence-level sentiment and emotion annotation from the annotators point of view.Peer reviewe

    The Language of Emotions : Building and Applying Computational Methods for Emotion Detection for English and Beyond

    No full text
    Emotions have always been central to the human experience: the ancient Greeks had philosophical debates about the nature of emotions and Charles Darwin can be said to have founded the modern theories of emotions with his study "The expression of the emotions in man and animals". Theories of emotion are still actively researched in many different fields from psychology, cognitive science, and anthropology to computer science. Sentiment analysis usually refers to the use of computational tools to identify and extract sentiments and emotions from various modalities. In this dissertation, I use sentiment analysis in conjunction with natural language processing to identify, quantify, and classify emotions in text. Specifically, emotions are examined in multilingual settings using multidimensional models of emotions. Plutchik's wheel of emotions and emotional intensities are used to classify emotions in parallel corpora via both lexical methods and supervised machine learning methods. By analyzing emotional language content in text, the connection between language and emotions can be better understood. I have developed new approaches to create a more equitable natural language processing approach for sentiment analysis, meaning the development and evaluation of massively multilingual annotated datasets, contributing to the provision of tools for under-resourced languages. This dissertation is comprised of ten articles on related topics in sentiment analysis. In these articles, I discuss lexicon-based methods and the creation of emotion and sentiment lexicons, the creation of datasets for supervised machine learning, the training of models for supervised machine learning, and the evaluation of such models. I also examine the annotation process in relation to creating datasets in depth, including the creation of a light-weight easily deployed annotation platform. As an additional step, I test the different approaches in downstream applications. These practical applications include the study of political party rhetoric from the perspective of emotion words used and the intensities of those emotion words. I also examine how simple lexicon-based methods can be used to make the study of affect in literature less subjective. Additionally, I attempt to link sentiment analysis with hate speech detection and offensive speech target identification. The main contribution of this dissertation is in providing tools for sentiment analysis and in demonstrating how these tools can be augmented for use in a wide variety of languages and practical applications at low cost.Tunteet ovat aina olleet keskeisessä asemassa ihmiskokemuksessa: muinaisilla kreikkalaisilla oli filosofisia keskusteluja tunteiden luonteesta, ja Charles Darwinin voidaan sanoa perustaneen modernit tunneteoriat kirjallaan “Tunteiden ilmaisu ihmisissä ja eläimissä“. Tunneteorioita tutkitaan edelleen aktiivisesti monilla eri aloilla psykologiasta, kognitiotieteestä ja antropologiasta tietojenkäsittelytieteeseen. Sentimenttianalyysi viittaa yleensä tunteiden tunnistamiseen ja erottamiseen laskennallisin menetelmin eri modaliteeteissä. Tässä väitöskirjassa käytän sentimenttianalyysiä yhdessä kieliteknologian menetelmien kanssa tunteiden tunnistamiseksi, kvantifioimiseksi ja luokittelemiseksi tekstissä. Erityisesti tutkin tunteita ja niiden käyttöä monikielisissä ympäristöissä käyttäen moniulotteisia tunnemalleja. Hyödynnän Plutchikin tunnepyörää ja emotionaalisia intensiteettejä rinnakkaiskorpuksissa esiintyvien tunteiden luokittelussa sekä leksikaalisten että valvottujen koneoppimismenetelmien avulla. Analysoimalla tekstin emotionaalista kieltä voidaan paremmin ymmärtää kielen ja tunteiden välistä yhteyttä. Olen kehittänyt uusia lähestymistapoja helpottamaan massiivisesti monikielisen annotoidun datan kehittämistä ja arviointia, mikä osaltaan auttaa tarjoamaan työkaluja pieniresurssisille kielille. Tämä väitöskirja koostuu kymmenestä artikkelista, jotka kytkeytyvät eri tavoin sentimenttianalyysiin. Näissä artikkeleissa käsittelen leksikkopohjaisia menetelmiä, sentimentti- ja emootioleksikkojen kehitystä, datan keräystä koneoppimista varten sekä koneoppimismallien arviointia datan avulla. Lisäksi sovellan näitä lähestymistapoja yhteiskuntatieteissä ja kirjallisuustieteissä. Tutkin myös annotaatioprosessin vaikutusta annotoijien tuottaman datan laatuun Sentimentator-annotaatiotyökalun kehitystyöhon pohjautuen. Tämän väitöskirjan tärkein kontribuutio on tekstianalyysityökalujen kehittäminen sekä niiden hyödyllisyyden osoittaminen useilla eri kielillä ja erilaisissa käytännön sovelluksissa
    corecore