51 research outputs found
Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Twitter is a popular social network platform where users can interact and
post texts of up to 280 characters called tweets. Hashtags, hyperlinked words
in tweets, have increasingly become crucial for tweet retrieval and search.
Using hashtags for tweet topic classification is a challenging problem because
of context dependent among words, slangs, abbreviation and emoticons in a short
tweet along with evolving use of hashtags. Since Twitter generates millions of
tweets daily, tweet analytics is a fundamental problem of Big data stream that
often requires a real-time Distributed processing. This paper proposes a
distributed online approach to tweet topic classification with hashtags. Being
implemented on Apache Storm, a distributed real time framework, our approach
incrementally identifies and updates a set of strong predictors in the Na\"ive
Bayes model for classifying each incoming tweet instance. Preliminary
experiments show promising results with up to 97% accuracy and 37% increase in
throughput on eight processors.Comment: IEEE International Conference on Big Data 201
The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition
Negators, modals, and degree adverbs can significantly affect the sentiment
of the words they modify. Often, their impact is modeled with simple
heuristics; although, recent work has shown that such heuristics do not capture
the true sentiment of multi-word phrases. We created a dataset of phrases that
include various negators, modals, and degree adverbs, as well as their
combinations. Both the phrases and their constituent content words were
annotated with real-valued scores of sentiment association. Using phrasal terms
in the created dataset, we analyze the impact of individual modifiers and the
average effect of the groups of modifiers on overall sentiment. We find that
the effect of modifiers varies substantially among the members of the same
group. Furthermore, each individual modifier can affect sentiment words in
different ways. Therefore, solutions based on statistical learning seem more
promising than fixed hand-crafted rules on the task of automatic sentiment
prediction.Comment: In Proceedings of the 7th Workshop on Computational Approaches to
Subjectivity, Sentiment and Social Media Analysis (WASSA), San Diego,
California, 201
Combining Sentiment Lexicons of Arabic Terms
Lexicons are dictionaries of sentiment words and their matching polarity. Some comprise words that are numerically scored based on the degree of positivity/negativity of the underlying sentiments. The ranges of scores differ since each lexicon has its own scoring process. Others use labelled words instead of scores with polarity tags (i.e., positive/negative/neutral). Lexicons are important in text mining and sentiment analysis which compels researchers to develop and publish them. Larger lexicons better train sentiment models thereby classifying sentiments in text more accurately. Hence, it is useful to combine the various available lexicons. Nevertheless, there exist many duplicates, overlaps and contradictions between these lexicons. In this paper, we define a method to combine different lexicons. We used the method to normalize and unify lexicon items and merge duplicated lexicon items from twelve lexicons for (in)formal Arabic. This resulted in a coherent Arabic sentiment lexicon with the largest number of terms
- …