10,781 research outputs found
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Discovering Power Laws in Entity Length
This paper presents a discovery that the length of the entities in various
datasets follows a family of scale-free power law distributions. The concept of
entity here broadly includes the named entity, entity mention, time expression,
aspect term, and domain-specific entity that are well investigated in natural
language processing and related areas. The entity length denotes the number of
words in an entity. The power law distributions in entity length possess the
scale-free property and have well-defined means and finite variances. We
explain the phenomenon of power laws in entity length by the principle of least
effort in communication and the preferential mechanism
Extracting semantic entities and events from sports tweets
Large volumes of user-generated content on practically every major issue and event are being created on the microblogging site Twitter. This content can be combined and processed to detect events, entities and popular moods to feed various knowledge-intensive practical applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. In this paper, we exploit various approaches to detect the named entities and significant micro-events from users’ tweets during a live sports event. Here we describe how combining linguistic features with background knowledge and the use of Twitter-specific features can achieve high, precise detection results (f-measure = 87%) in different datasets. A study was conducted on tweets from cricket matches in the ICC World Cup in order to augment the event-related non-textual media with collective intelligence
Discovering conversational topics and emotions associated with Demonetization tweets in India
Social media platforms contain great wealth of information which provides us
opportunities explore hidden patterns or unknown correlations, and understand
people's satisfaction with what they are discussing. As one showcase, in this
paper, we summarize the data set of Twitter messages related to recent
demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights
from Twitter's data. Our proposed system automatically extracts the popular
latent topics in conversations regarding demonetization discussed in Twitter
via the Latent Dirichlet Allocation (LDA) based topic model and also identifies
the correlated topics across different categories. Additionally, it also
discovers people's opinions expressed through their tweets related to the event
under consideration via the emotion analyzer. The system also employs an
intuitive and informative visualization to show the uncovered insight.
Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI),
to select the best LDA models. The obtained LDA results show that the tool can
be effectively used to extract discussion topics and summarize them for further
manual analysis.Comment: 6 pages, 11 figures. arXiv admin note: substantial text overlap with
arXiv:1608.02519 by other authors; text overlap with arXiv:1705.08094 by
other author
- …