2 research outputs found
Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Twitter is a popular social network platform where users can interact and
post texts of up to 280 characters called tweets. Hashtags, hyperlinked words
in tweets, have increasingly become crucial for tweet retrieval and search.
Using hashtags for tweet topic classification is a challenging problem because
of context dependent among words, slangs, abbreviation and emoticons in a short
tweet along with evolving use of hashtags. Since Twitter generates millions of
tweets daily, tweet analytics is a fundamental problem of Big data stream that
often requires a real-time Distributed processing. This paper proposes a
distributed online approach to tweet topic classification with hashtags. Being
implemented on Apache Storm, a distributed real time framework, our approach
incrementally identifies and updates a set of strong predictors in the Na\"ive
Bayes model for classifying each incoming tweet instance. Preliminary
experiments show promising results with up to 97% accuracy and 37% increase in
throughput on eight processors.Comment: IEEE International Conference on Big Data 201