211 research outputs found
A multivalued emotion lexicon created and evaluated by the crowd
Sentiment analysis aims to uncover emotions conveyed through information. In its simplest form, it is performed on a polarity basis, where the goal is to classify information with positive or negative emotion. Recent research has explored more nuanced ways to perform emotion analysis. Unsupervised emotion analysis methods require a critical resource: a lexicon that is appropriate for the task at hand, in terms of the emotional range and diversity captured. Emotion analysis lexicons are created manually by domain experts and usually assign one single emotion to each word. We propose an automated workflow for creating and evaluating a multi- valued emotion lexicon created and evaluated through crowdsourcing. We compare the obtained lexicon with established lexicons and appoint expert English Linguists to assess crowd peer-evaluations. The proposed workflow provides a quality lexicon and can be used in a range of text property association tasks
A multivalued emotion lexicon created and evaluated by the crowd
Sentiment analysis aims to uncover emotions conveyed through information. In its simplest form, it is performed on a polarity basis, where the goal is to classify information with positive or negative emotion. Recent research has explored more nuanced ways to perform emotion analysis. Unsupervised emotion analysis methods require a critical resource: a lexicon that is appropriate for the task at hand, in terms of the emotional range and diversity captured. Emotion analysis lexicons are created manually by domain experts and usually assign one single emotion to each word. We propose an automated workflow for creating and evaluating a multi- valued emotion lexicon created and evaluated through crowdsourcing. We compare the obtained lexicon with established lexicons and appoint expert English Linguists to assess crowd peer-evaluations. The proposed workflow provides a quality lexicon and can be used in a range of text property association tasks
Social-media monitoring for cold-start recommendations
Generating personalized movie recommendations to users is a problem that most commonly relies on user-movie ratings. These ratings are generally used either to understand the user preferences or to recommend movies that users with similar rating patterns have rated highly. However, movie recommenders are often subject to the Cold-Start problem: new movies have not been rated by anyone, so, they will not be recommended to anyone; likewise, the preferences of new users who have not rated any movie cannot be learned. In parallel, Social-Media platforms, such as Twitter, collect great amounts of user feedback on movies, as these are very popular nowadays. This thesis proposes to explore feedback shared on Twitter to predict the popularity of new movies and show how it can be used to tackle the Cold-Start problem. It also proposes, at a finer grain, to explore the reputation of directors and actors on IMDb to tackle the Cold-Start problem. To assess these aspects, a Reputation-enhanced Recommendation Algorithm is implemented and evaluated on a crawled IMDb dataset with previous user ratings of old movies,together with Twitter data crawled from January 2014 to March 2014, to recommend 60 movies affected by the Cold-Start problem. Twitter revealed to be a strong reputation predictor, and the Reputation-enhanced Recommendation Algorithm improved over several baseline methods. Additionally, the algorithm also proved to be useful when recommending movies in an extreme Cold-Start scenario, where both new movies and users are affected by the Cold-Start problem
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
A review of sentiment analysis research in Arabic language
Sentiment analysis is a task of natural language processing which has
recently attracted increasing attention. However, sentiment analysis research
has mainly been carried out for the English language. Although Arabic is
ramping up as one of the most used languages on the Internet, only a few
studies have focused on Arabic sentiment analysis so far. In this paper, we
carry out an in-depth qualitative study of the most important research works in
this context by presenting limits and strengths of existing approaches. In
particular, we survey both approaches that leverage machine translation or
transfer learning to adapt English resources to Arabic and approaches that stem
directly from the Arabic language
A Structured Narrative Prompt for Prompting Narratives from Large Language Models: Sentiment Assessment of ChatGPT-Generated Narratives and Real Tweets
Large language models (LLMs) excel in providing natural language responses that sound authoritative, reflect knowledge of the context area, and can present from a range of varied perspectives. Agent-based models and simulations consist of simulated agents that interact within a simulated environment to explore societal, social, and ethical, among other, problems. Simulated agents generate large volumes of data and discerning useful and relevant content is an onerous task. LLMs can help in communicating agents\u27 perspectives on key life events by providing natural language narratives. However, these narratives should be factual, transparent, and reproducible. Therefore, we present a structured narrative prompt for sending queries to LLMs, we experiment with the narrative generation process using OpenAI\u27s ChatGPT, and we assess statistically significant differences across 11 Positive and Negative Affect Schedule (PANAS) sentiment levels between the generated narratives and real tweets using chi-squared tests and Fisher\u27s exact tests. The narrative prompt structure effectively yields narratives with the desired components from ChatGPT. In four out of forty-four categories, ChatGPT generated narratives which have sentiment scores that were not discernibly different, in terms of statistical significance (alpha level α = 0.05), from the sentiment expressed in real tweets. Three outcomes are provided: (1) a list of benefits and challenges for LLMs in narrative generation; (2) a structured prompt for requesting narratives of an LLM chatbot based on simulated agents\u27 information; (3) an assessment of statistical significance in the sentiment prevalence of the generated narratives compared to real tweets. This indicates significant promise in the utilization of LLMs for helping to connect a simulated agent\u27s experiences with real people
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
- …