Search CORE

61,451 research outputs found

Automatic Sentiment Analysis in On-line Text

Author: E. Boiy
K. Deschacht
M.F. Moens
P. Hens
Publication venue
Publication date
Field of study

The growing stream of content placed on the Web provides a huge collection of textual resources. People share their experiences on-line, ventilate their opinions (and frustrations), or simply talk just about anything. The large amount of available data creates opportunities for automatic mining and analysis. The information we are interested in this paper, is how people feel about certain topics. We consider it as a classification task: their feelings can be positive, negative or neutral. A sentiment isn't always stated in a clear way in the text; it is often represented in subtle, complex ways. Besides direct expression of the user's feelings towards a certain topic, he or she can use a diverse range of other techniques to express his or her emotions. On top of that, authors may mix objective and subjective information about a topic, or write down thoughts about other topics than the one we are investigating. Lastly, the data gathered from the World Wide Web often contains a lot of noise. All of this makes the task of automatic recognition of the sentiment in on-line text more difficult. We will give an overview of various techniques used to tackle the problems in the domain of sentiment analysis, and add some of our own results

Opinion mining with the SentWordNet lexical resource

Author: Ohana Bruno
Publication venue: Dublin Institute of Technology
Publication date: 01/03/2009
Field of study

Sentiment classification concerns the application of automatic methods for predicting the orientation of sentiment present on text documents. It is an important subject in opinion mining research, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is a lexical resource of sentiment information for terms in the English language designed to assist in opinion mining tasks, where each term is associated with numerical scores for positive and negative sentiment information. A resource that makes term level sentiment information readily available could be of use in building more effective sentiment classification methods. This research presents the results of an experiment that applied the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. First, a data set of relevant features extracted from text documents using SentiWordNet was designed and implemented. The resulting feature set is then used as input for training a support vector machine classifier for predicting the sentiment orientation of the underlying film review. Several scenarios exploring variations on the parameters that generate the data set, outlier removal and feature selection were executed. The results obtained are compared to other methods documented in the literature. It was found that they are in line with other experiments that propose similar approaches and use the same data set of film reviews, indicating SentiWordNet could become an important resource for the task of sentiment classification. Considerations on future improvements are also presented based on a detailed analysis of classification results

Arrow@TUDublin

Annotation and Prediction of Movie Sentiment Arcs

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2022
Field of study

Some narratologists have argued that all stories derive from a limited set of archetypes. Specifically, Vonnegut (2005) claims in his Shapes of Stories lecture that if we graph the emotions in a story over time, the shape will be an instance of one of six basic story shapes. The work of Jockers (2015) and Reagan et al. (2016) purports to confirm this hypothesis empirically using automatic sentiment analysis (rather than manual annotations of story arcs) and algorithms to cluster story arcs into fundamental shapes. Later work has applied similartechniques to movies (Del Vecchio et al., 2019). This line of work has attracted criticism. Swafford (2015) argues that sentiment analysis needs to be validated on and adapted to narrative text. Enderle (2016) argues that the various methods to reduce story shapes to the putative six fundamental types are actually producing algorithmic artifacts, and that random sentiment arcs can also be clustered into six “fundamental” shapes.In this paper I will not attempt to find fundamental (or even universal) story shapes, but I will take the observed story shape for each narrative as is, without trying to cluster them into archetypes. My aim is to perform an empirical validation of how well basic sentiment analysis tools can reproduce a sentiment arc obtained through manual annotation based on narrative text. Rather than considering novels as narratives, I consider movies, since the annotation ofmovies, when done in real time, is less time consuming. In a previous abstract, I considered the task of predicting the annotated sentiment of individual sentences from movie scripts (van Cranenburgh, 2020), and concluded that sentiment analysis tools achieve comparable performance on narrative text as compared to reviews and social media text (pace Swafford 2015). In this abstract I consider the task of predicting the overall sentiment as annotated based on watching the movie. This task is more challenging since the connection between the narrative sentiment and the narrative text is potentially more distant

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Annotation and Prediction of Movie Sentiment Arcs

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2022
Field of study

ARTS repository - University of Groningen

Active learning in annotating micro-blogs dealing with e-reputation

Author: Cossu Jean-Valère
Molina-Villegas Alejandro
Tello-Signoret Mariana
Publication venue
Publication date: 25/09/2017
Field of study

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

arXiv.org e-Print Archive

Episciences.org

Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Van de Kauter Marjan
Van Hee Cynthia
Publication venue
Publication date: 01/01/2017
Field of study

In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

Ghent University Academic Bibliography

Detection and fine-grained classification of cyberbullying events

Author: Daelemans Walter
De Pauw Guy
Desmet Bart
Hoste Veronique
Lefever Els
Mennes Julie
Van Hee Cynthia
Verhoeven Ben
Publication venue
Publication date: 01/01/2015
Field of study

In the current era of online interactions, both positive and negative experiences are abundant on the Web. As in real life, negative experiences can have a serious impact on youngsters. Recent studies have reported cybervictimization rates among teenagers that vary between 20% and 40%. In this paper, we focus on cyberbullying as a particular form of cybervictimization and explore its automatic detection and fine-grained classification. Data containing cyberbullying was collected from the social networking site Ask.fm. We developed and applied a new scheme for cyberbullying annotation, which describes the presence and severity of cyberbullying, a post author's role (harasser, victim or bystander) and a number of fine-grained categories related to cyberbullying, such as insults and threats. We present experimental results on the automatic detection of cyberbullying and explore the feasibility of detecting the more fine-grained cyberbullying categories in online posts. For the first task, an F-score of 55.39% is obtained. We observe that the detection of the fine-grained categories (e.g. threats) is more challenging, presumably due to data sparsity, and because they are often expressed in a subtle and implicit way

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen