1,523 research outputs found
Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Twitter is a popular social network platform where users can interact and
post texts of up to 280 characters called tweets. Hashtags, hyperlinked words
in tweets, have increasingly become crucial for tweet retrieval and search.
Using hashtags for tweet topic classification is a challenging problem because
of context dependent among words, slangs, abbreviation and emoticons in a short
tweet along with evolving use of hashtags. Since Twitter generates millions of
tweets daily, tweet analytics is a fundamental problem of Big data stream that
often requires a real-time Distributed processing. This paper proposes a
distributed online approach to tweet topic classification with hashtags. Being
implemented on Apache Storm, a distributed real time framework, our approach
incrementally identifies and updates a set of strong predictors in the Na\"ive
Bayes model for classifying each incoming tweet instance. Preliminary
experiments show promising results with up to 97% accuracy and 37% increase in
throughput on eight processors.Comment: IEEE International Conference on Big Data 201
Population Density-based Hospital Recommendation with Mobile LBS Big Data
The difficulty of getting medical treatment is one of major livelihood issues
in China. Since patients lack prior knowledge about the spatial distribution
and the capacity of hospitals, some hospitals have abnormally high or sporadic
population densities. This paper presents a new model for estimating the
spatiotemporal population density in each hospital based on location-based
service (LBS) big data, which would be beneficial to guiding and dispersing
outpatients. To improve the estimation accuracy, several approaches are
proposed to denoise the LBS data and classify people by detecting their various
behaviors. In addition, a long short-term memory (LSTM) based deep learning is
presented to predict the trend of population density. By using Baidu
large-scale LBS logs database, we apply the proposed model to 113 hospitals in
Beijing, P. R. China, and constructed an online hospital recommendation system
which can provide users with a hospital rank list basing the real-time
population density information and the hospitals' basic information such as
hospitals' levels and their distances. We also mine several interesting
patterns from these LBS logs by using our proposed system
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
- …