114 research outputs found
Emoticon-based Ambivalent Expression: A Hidden Indicator for Unusual Behaviors in Weibo
Recent decades have witnessed online social media being a big-data window for
quantificationally testifying conventional social theories and exploring much
detailed human behavioral patterns. In this paper, by tracing the emoticon use
in Weibo, a group of hidden "ambivalent users" are disclosed for frequently
posting ambivalent tweets containing both positive and negative emotions.
Further investigation reveals that this ambivalent expression could be a novel
indicator of many unusual social behaviors. For instance, ambivalent users with
the female as the majority like to make a sound in midnights or at weekends.
They mention their close friends frequently in ambivalent tweets, which attract
more replies and thus serve as a more private communication way. Ambivalent
users also respond differently to public affairs from others and demonstrate
more interests in entertainment and sports events. Moreover, the sentiment
shift of words adopted in ambivalent tweets is more evident than usual and
exhibits a clear "negative to positive" pattern. The above observations, though
being promiscuous seemingly, actually point to the self regulation of negative
mood in Weibo, which could find its base from the emotion management theories
in sociology but makes an interesting extension to the online environment.
Finally, as an interesting corollary, ambivalent users are found connected with
compulsive buyers and turn out to be perfect targets for online marketing.Comment: Data sets can be downloaded freely from www.datatang.com/data/47207
or http://pan.baidu.com/s/1mg67cbm. Any issues feel free to contact
[email protected]
Sentimental Analysis of Twitter Data using Classifier Algorithms
Microblogging has become a daily routine for most of the people in this world. With the help of Microblogging people get opinions about several things going on, not only around the nation but also worldwide. Twitter is one such online social networking website where people can post their views regarding something. It is a huge platform having over 316 Million users registered from all over the world. It enables users to send and read short messages with over 140 characters for compatibility with SMS messaging. A good sentimental analysis of data of this huge platform can lead to achieve many new applications like – Movie reviews, Product reviews, Spam detection, Knowing consumer needs, etc. In this paper, we have devised a new algorithm with which the above needs can be achieved. Our algorithm uses three specific techniques for sentimental analysis and can be called a hybrid algorithm – (1) Hash Tag Classification for topic modeling; (2) Naïve Bayes Classifier Algorithm for polarity classification; (3) Emoticon Analysis for Neutral polar data. These techniques individually have some limitations for sentimental analysis
Sentiment analysis of health care tweets: review of the methods used.
BACKGROUND: Twitter is a microblogging service where users can send and read short 140-character messages called "tweets." There are several unstructured, free-text tweets relating to health care being shared on Twitter, which is becoming a popular area for health care research. Sentiment is a metric commonly used to investigate the positive or negative opinion within these messages. Exploring the methods used for sentiment analysis in Twitter health care research may allow us to better understand the options available for future research in this growing field. OBJECTIVE: The first objective of this study was to understand which tools would be available for sentiment analysis of Twitter health care research, by reviewing existing studies in this area and the methods they used. The second objective was to determine which method would work best in the health care settings, by analyzing how the methods were used to answer specific health care questions, their production, and how their accuracy was analyzed. METHODS: A review of the literature was conducted pertaining to Twitter and health care research, which used a quantitative method of sentiment analysis for the free-text messages (tweets). The study compared the types of tools used in each case and examined methods for tool production, tool training, and analysis of accuracy. RESULTS: A total of 12 papers studying the quantitative measurement of sentiment in the health care setting were found. More than half of these studies produced tools specifically for their research, 4 used open source tools available freely, and 2 used commercially available software. Moreover, 4 out of the 12 tools were trained using a smaller sample of the study's final data. The sentiment method was trained against, on an average, 0.45% (2816/627,024) of the total sample data. One of the 12 papers commented on the analysis of accuracy of the tool used. CONCLUSIONS: Multiple methods are used for sentiment analysis of tweets in the health care setting. These range from self-produced basic categorizations to more complex and expensive commercial software. The open source and commercial methods are developed on product reviews and generic social media messages. None of these methods have been extensively tested against a corpus of health care messages to check their accuracy. This study suggests that there is a need for an accurate and tested tool for sentiment analysis of tweets trained using a health care setting-specific corpus of manually annotated tweets first
Lisbon Emoji and Emoticon Database (LEED): norms for emoji and emoticons in seven evaluative dimensions
The use of emoticons and emoji is increasingly popular across a variety of new platforms of online communication. They have also become popular as stimulus materials in scientific research. However, the assumption that emoji/emoticon users’ interpretations always correspond to the developers’/researchers’ intended meanings might be misleading. This article presents subjective norms of emoji and emoticons provided by everyday users. The Lisbon Emoji and Emoticon Database (LEED) comprises 238 stimuli: 85 emoticons and 153 emoji (collected from iOS, Android, Facebook, and Emojipedia). The sample included 505 Portuguese participants recruited online. Each participant evaluated a random subset of 20 stimuli for seven dimensions: aesthetic appeal, familiarity, visual complexity, concreteness, valence, arousal, and meaningfulness. Participants were additionally asked to attribute a meaning to each stimulus. The norms obtained include quantitative descriptive results (means, standard deviations, and confidence intervals) and a meaning analysis for each stimulus. We also examined the correlations between the dimensions and tested for differences between emoticons and emoji, as well as between the two major operating systems—Android and iOS. The LEED constitutes a readily available normative database (available at www.osf.io/nua4x) with potential applications to different research domains.info:eu-repo/semantics/acceptedVersio
Diffusion of Lexical Change in Social Media
Computer-mediated communication is driving fundamental changes in the nature
of written language. We investigate these changes by statistical analysis of a
dataset comprising 107 million Twitter messages (authored by 2.7 million unique
user accounts). Using a latent vector autoregressive model to aggregate across
thousands of words, we identify high-level patterns in diffusion of linguistic
change over the United States. Our model is robust to unpredictable changes in
Twitter's sampling rate, and provides a probabilistic characterization of the
relationship of macro-scale linguistic influence to a set of demographic and
geographic predictors. The results of this analysis offer support for prior
arguments that focus on geographical proximity and population size. However,
demographic similarity -- especially with regard to race -- plays an even more
central role, as cities with similar racial demographics are far more likely to
share linguistic influence. Rather than moving towards a single unified
"netspeak" dialect, language evolution in computer-mediated communication
reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311
Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM
Sentiment analysis on large-scale social media data is important to bridge
the gaps between social media contents and real world activities including
political election prediction, individual and public emotional status
monitoring and analysis, and so on. Although textual sentiment analysis has
been well studied based on platforms such as Twitter and Instagram, analysis of
the role of extensive emoji uses in sentiment analysis remains light. In this
paper, we propose a novel scheme for Twitter sentiment analysis with extra
attention on emojis. We first learn bi-sense emoji embeddings under positive
and negative sentimental tweets individually, and then train a sentiment
classifier by attending on these bi-sense emoji embeddings with an
attention-based long short-term memory network (LSTM). Our experiments show
that the bi-sense embedding is effective for extracting sentiment-aware
embeddings of emojis and outperforms the state-of-the-art models. We also
visualize the attentions to show that the bi-sense emoji embedding provides
better guidance on the attention mechanism to obtain a more robust
understanding of the semantics and sentiments
Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet "I love iPhone, but I hate iPad" can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets
- …