539 research outputs found

    Sentiment Analysis in Social Streams

    Get PDF
    In this chapter, we review and discuss the state of the art on sentiment analysis in social streams—such as web forums, microblogging systems, and social networks, aiming to clarify how user opinions, affective states, and intended emo tional effects are extracted from user generated content, how they are modeled, and howthey could be finally exploited.We explainwhy sentiment analysistasks aremore difficult for social streams than for other textual sources, and entail going beyond classic text-based opinion mining techniques. We show, for example, that social streams may use vocabularies and expressions that exist outside the mainstream of standard, formal languages, and may reflect complex dynamics in the opinions and sentiments expressed by individuals and communities

    Classifying Crises-Information Relevancy with Semantics

    Get PDF
    Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and affected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming. In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis

    An Emotional Word Analyzer for Portuguese

    Get PDF
    The analysis of sentiments, emotions and opinions in texts is increasingly important in the current digital world. The existing lexicons with emotional annotations for the Portuguese language are oriented to polarities, classifying words as positive, negative or neutral. To identify the emotional load intended by the author it is necessary also to categorize the emotions expressed by individual words. EmoSpell is an extension of a morphological analyzer with semantic annotations of the emotional value of words. It uses Jspell as the morphological analyzer and a new dictionary with emotional annotations. This dictionary incorporates the lexical base EMOTAIX.PT, which classifies words based on three different levels of emotions - global, specific and intermediate. This paper describes the generation of the EmoSpell dictionary using three sources, the Jspell Portuguese dictionary and the lexical bases EMOTAIX.PT and SentiLex-PT. Also, this paper details the web application and web service that exploit this dictionary. It presents also a validation of the proposed approach using a corpus of student texts with different emotional loads. The validation compares the analyses provided by EmoSpell with the mentioned emotional lexical bases on the ability to recognize emotional words and extract the dominant emotion from a text

    LIWC-UD: Classifying Online Slang Terms into LIWC Categories

    Get PDF

    Sentiment Lexicon Adaptation with Context and Semantics for the Social Web

    Get PDF
    Sentiment analysis over social streams offers governments and organisations a fast and effective way to monitor the publics' feelings towards policies, brands, business, etc. General purpose sentiment lexicons have been used to compute sentiment from social streams, since they are simple and effective. They calculate the overall sentiment of texts by using a general collection of words, with predetermined sentiment orientation and strength. However, words' sentiment often vary with the contexts in which they appear, and new words might be encountered that are not covered by the lexicon, particularly in social media environments where content emerges and changes rapidly and constantly. In this paper, we propose a lexicon adaptation approach that uses contextual as well as semantic information extracted from DBPedia to update the words' weighted sentiment orientations and to add new words to the lexicon. We evaluate our approach on three different Twitter datasets, and show that enriching the lexicon with contextual and semantic information improves sentiment computation by 3.4% in average accuracy, and by 2.8% in average F1 measure

    IberLEF 2021 Overview: Natural Language Processing for Iberian Languages

    Full text link
    [EN] IberLEF is a comparative evaluation campaign for Natural Language Processing Systems in Spanish and other Iberian languages. Its goal is to encourage the research community to organize competitive text processing, understanding and generation tasks in order to define new research challenges and set new state-of-the-art results in those languages. This paper summarizes the evaluation activities carried out in IberLEF 2021, which included twelve tasks dealing with emotions, stance and opinions, harmful information, health-related information extraction and discovery, humor and irony, and lexical acquisition. Overall, IberLEF activities were a remarkable collective effort involving 359 researchers from 22 countries in Europe, Asia and the Americas.The authors of this overview have been supported by the Spanish Government, Ministry of Science and Innovation, via research grants MISMIS (PGC2018- 096212-B), MISMIS-BIAS (PGC2018-096212-B-C32) and MISMISFAKEnHATE (PGC2018-096212-B-C31); and by CONACyT-Mexico project CB-2015-01- 257383 and the thematic networks program (Language Technologies Thematic Network).Gonzalo, J.; Montes-Y-Gómez, M.; Rosso, P. (2021). IberLEF 2021 Overview: Natural Language Processing for Iberian Languages. CEUR Workshop. 1-15. http://hdl.handle.net/10251/19056211

    Automatic extraction of mobility activities in microblogs

    Get PDF
    Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT

    Get PDF
    Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text

    Implementation of Classification of Geolocation of Country from Worldwide Tweets

    Get PDF
    Social media are progressively being employed within the scientific community as key supply of knowledge to assist perceive various natural and social phenomena, and this has prompted the event of a good vary of process data processing tools that may extract data from social media for each post-hoc and real time analysis. The rise of interest in mistreatment social media as a supply for analysis has actuated braving the challenge of mechanically geo-locating tweets, given the dearth of specific location data within the majority of tweets. In distinction to abundant previous work that has targeted on location classification of tweets restricted to a selected country, here we tend to undertake the task during a broader context by classifying international tweets at the country level that is up to now undiscovered during a time period situation. We tend to analyze the extent to that a tweet’s country of origin maybe determined by creating use of eight tweet-inherent options for classification
    corecore