25,033 research outputs found
Sentiment Analysis of Short Informal Texts
Abstract We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surfaceform, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task 'Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-ofthe-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points
Enhancing Lexical Sentiment Analysis using LASSO Style Regularization
In the current information age where expressing one’s opinions online requires but
a few button presses, there is great interest in analyzing and predicting such emotional
expression. Sentiment analysis is described as the study of how to quantify and predict
such emotional expression by applying various analytical methods. This realm of study
can broadly be separated into two domains: those which quantify sentiment using sets of
features determined by humans, and approaches that utilize machine learning. An issue
with the later approaches being that the features which describe sentiment within text
are challenging to interpret. By combining VADER which is short for Valence Aware
Dictionary for sEntiment Reasoning; a lexicon model with machine learning tools (simulated
annealing) and k-fold cross validation we can improve the performance of VADER
within and across context. To validate this modified VADER algorithm we contribute
to the literature of sentiment analysis by sharing a dataset sourced from Steam; an online
video game platform. The benefits of using Steam for training purposes is that it
contains several unique properties from both social media and online web retailers such
as Amazon. The results obtained from applying this modified VADER algorithm indicate
that parameters need to be re-trained for each dataset/context. Furthermore that
using statistical learning tools to estimate these parameters improves the performance
of VADER within and across context. As an addendum we provide a general overview
of the current state of sentiment analysis and apply BERT a Transformer-based neural
network model to the collected Steam dataset. These results were then compared to
both base VADER and modified VADER
Sentiment analysis on twitter through topicbased lexicon expansion
Abstract. Supervised learning approaches are domain-dependent and it is costly to obtain labeled training data from different domains. Lexiconbased approaches enjoy stable performance across domains, but often cannot capture domain-dependent features. It is also hard for lexiconbased classifiers to identify the polarities of abbreviations and misspellings, which are common in short informal social text but usually not found in general sentiment lexicons. We propose to overcome this limitation by expanding a general lexicon with domain-dependent opinion words as well as abbreviations and informal opinion expressions. The expanded terms are automatically selected based on their mutual information with emoticons. As there is an abundant amount of emoticon-bearing tweets on Twitter, our approach provides a way to do domain-dependent sentiment analysis without the cost of data annotation. We show that our technique leads to statistically significant improvements in classification accuracies across 56 topics with a state-of-the-art lexicon-based classifier. We also present the expanded terms, and show the most representative opinion expressions obtained from co-occurrence with emoticons
Opinion Mining on Non-English Short Text
As the type and the number of such venues increase, automated analysis of
sentiment on textual resources has become an essential data mining task. In
this paper, we investigate the problem of mining opinions on the collection of
informal short texts. Both positive and negative sentiment strength of texts
are detected. We focus on a non-English language that has few resources for
text mining. This approach would help enhance the sentiment analysis in
languages where a list of opinionated words does not exist. We propose a new
method projects the text into dense and low dimensional feature vectors
according to the sentiment strength of the words. We detect the mixture of
positive and negative sentiments on a multi-variant scale. Empirical evaluation
of the proposed framework on Turkish tweets shows that our approach gets good
results for opinion mining
- …