25,033 research outputs found

    Sentiment Analysis of Short Informal Texts

    Get PDF
    Abstract We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surfaceform, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task 'Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-ofthe-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points

    Enhancing Lexical Sentiment Analysis using LASSO Style Regularization

    Get PDF
    In the current information age where expressing one’s opinions online requires but a few button presses, there is great interest in analyzing and predicting such emotional expression. Sentiment analysis is described as the study of how to quantify and predict such emotional expression by applying various analytical methods. This realm of study can broadly be separated into two domains: those which quantify sentiment using sets of features determined by humans, and approaches that utilize machine learning. An issue with the later approaches being that the features which describe sentiment within text are challenging to interpret. By combining VADER which is short for Valence Aware Dictionary for sEntiment Reasoning; a lexicon model with machine learning tools (simulated annealing) and k-fold cross validation we can improve the performance of VADER within and across context. To validate this modified VADER algorithm we contribute to the literature of sentiment analysis by sharing a dataset sourced from Steam; an online video game platform. The benefits of using Steam for training purposes is that it contains several unique properties from both social media and online web retailers such as Amazon. The results obtained from applying this modified VADER algorithm indicate that parameters need to be re-trained for each dataset/context. Furthermore that using statistical learning tools to estimate these parameters improves the performance of VADER within and across context. As an addendum we provide a general overview of the current state of sentiment analysis and apply BERT a Transformer-based neural network model to the collected Steam dataset. These results were then compared to both base VADER and modified VADER

    Sentiment analysis on twitter through topicbased lexicon expansion

    Get PDF
    Abstract. Supervised learning approaches are domain-dependent and it is costly to obtain labeled training data from different domains. Lexiconbased approaches enjoy stable performance across domains, but often cannot capture domain-dependent features. It is also hard for lexiconbased classifiers to identify the polarities of abbreviations and misspellings, which are common in short informal social text but usually not found in general sentiment lexicons. We propose to overcome this limitation by expanding a general lexicon with domain-dependent opinion words as well as abbreviations and informal opinion expressions. The expanded terms are automatically selected based on their mutual information with emoticons. As there is an abundant amount of emoticon-bearing tweets on Twitter, our approach provides a way to do domain-dependent sentiment analysis without the cost of data annotation. We show that our technique leads to statistically significant improvements in classification accuracies across 56 topics with a state-of-the-art lexicon-based classifier. We also present the expanded terms, and show the most representative opinion expressions obtained from co-occurrence with emoticons

    Opinion Mining on Non-English Short Text

    Full text link
    As the type and the number of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. In this paper, we investigate the problem of mining opinions on the collection of informal short texts. Both positive and negative sentiment strength of texts are detected. We focus on a non-English language that has few resources for text mining. This approach would help enhance the sentiment analysis in languages where a list of opinionated words does not exist. We propose a new method projects the text into dense and low dimensional feature vectors according to the sentiment strength of the words. We detect the mixture of positive and negative sentiments on a multi-variant scale. Empirical evaluation of the proposed framework on Turkish tweets shows that our approach gets good results for opinion mining
    • …
    corecore