46,167 research outputs found
A new ANEW: Evaluation of a word list for sentiment analysis in microblogs
Sentiment analysis of microblogs such as Twitter has recently gained a fair
amount of attention. One of the simplest sentiment analysis approaches compares
the words of a posting against a labeled word list, where each word has been
scored for valence, -- a 'sentiment lexicon' or 'affective word lists'. There
exist several affective word lists, e.g., ANEW (Affective Norms for English
Words) developed before the advent of microblogging and sentiment analysis. I
wanted to examine how well ANEW and other word lists performs for the detection
of sentiment strength in microblog posts in comparison with a new word list
specifically constructed for microblogs. I used manually labeled postings from
Twitter scored for sentiment. Using a simple word matching I show that the new
word list may perform better than ANEW, though not as good as the more
elaborate approach found in SentiStrength.Comment: 6 pages, 4 figures, 1 table, Submitted to "Making Sense of Microposts
(#MSM2011)
Sarcasm Detection and User Behaviour Analysis
Sarcasm is a sort of sentiment where public expresses their negative emotions using positive word within the text. It is very tough for humans to acknowledge. In this way we show the interest in sarcasm detection of social media text, particularly in tweets. In this paper we propose new method pattern based approach for sarcasm detection, and also used behavioral modelling approach for effective sarcasm detection by analyzing the content of tweets however by conjoint exploiting the activity traits of users derived from their past activities. In this way we propose the different method for sarcasm detection such as, Sentiment-related Features, Punctuation-Related Features, Syntactic and Semantic Features, Pattern-Related Features approach for detection of sarcasm in the tweet. We also develop the behavioural modeling approach to check the user emotion and sentiment analysis. By using the various classifiers such as TREE, Support Vector Machine (SVM), BOOST and Maximum Entropy, we check the accuracy and performance. Our proposed approach reaches an accuracy of 94 %
Latent dirichlet markov allocation for sentiment analysis
In recent years probabilistic topic models have gained tremendous attention in data mining and natural language processing research areas. In the field of information retrieval for text mining, a variety of probabilistic topic models have been used to analyse content of documents. A topic model is a generative model for documents, it specifies a probabilistic procedure by which documents can be generated. All topic models share the idea that documents are mixture of topics, where a topic is a probability distribution over words. In this paper we describe Latent Dirichlet Markov Allocation Model (LDMA), a new generative probabilistic topic model, based on Latent Dirichlet Allocation (LDA) and Hidden Markov Model (HMM), which emphasizes on extracting multi-word topics from text data. LDMA is a four-level hierarchical Bayesian model where topics are associated with documents, words are associated with topics and topics in the model can be presented with single- or multi-word terms. To evaluate performance of LDMA, we report results in the field of aspect detection in sentiment analysis, comparing to the basic LDA model
The role of approximate negators in modeling the automatic detection of negation in tweets
Although improvements have been made in the performance of sentiment analysis tools, the automatic detection of negated text (which affects negative sentiment prediction) still presents challenges. More research is needed on new forms of negation beyond prototypical negation cues such as “not” or “never.” The present research reports findings on the role of a set of words called “approximate negators,” namely “barely,” “hardly,” “rarely,” “scarcely,” and “seldom,” which, in specific occasions (such as attached to a word from the non-affirmative adverb “any” family), can operationalize negation styles not yet explored. Using a corpus of 6,500 tweets, human annotation allowed for the identification of 17 recurrent usages of these words as negatives (such as “very seldom”) which, along with findings from the literature, helped engineer specific features that guided a machine learning classifier in predicting negated tweets. The machine learning experiments also modeled negation scope (i.e. in which specific words are negated in the text) by employing lexical and dependency graph information. Promising results included F1 values for negation detection ranging from 0.71 to 0.89 and scope detection from 0.79 to 0.88. Future work will be directed to the application of these findings in automatic sentiment classification, further exploration of patterns in data (such as part-of-speech recurrences for these new types of negation), and the investigation of sarcasm, formal language, and exaggeration as themes that emerged from observations during corpus annotation
Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks
In this work, we propose a new model for aspect-based sentiment analysis. In
contrast to previous approaches, we jointly model the detection of aspects and
the classification of their polarity in an end-to-end trainable neural network.
We conduct experiments with different neural architectures and word
representations on the recent GermEval 2017 dataset. We were able to show
considerable performance gains by using the joint modeling approach in all
settings compared to pipeline approaches. The combination of a convolutional
neural network and fasttext embeddings outperformed the best submission of the
shared task in 2017, establishing a new state of the art.Comment: EMNLP 201
Recommended from our members
Cross-Lingual and Low-Resource Sentiment Analysis
Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages.
This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language.
Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis.
To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments.
The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language.
In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment
Opinion Mining on Non-English Short Text
As the type and the number of such venues increase, automated analysis of
sentiment on textual resources has become an essential data mining task. In
this paper, we investigate the problem of mining opinions on the collection of
informal short texts. Both positive and negative sentiment strength of texts
are detected. We focus on a non-English language that has few resources for
text mining. This approach would help enhance the sentiment analysis in
languages where a list of opinionated words does not exist. We propose a new
method projects the text into dense and low dimensional feature vectors
according to the sentiment strength of the words. We detect the mixture of
positive and negative sentiments on a multi-variant scale. Empirical evaluation
of the proposed framework on Turkish tweets shows that our approach gets good
results for opinion mining
- …