2,177 research outputs found
Context Modeling for Ranking and Tagging Bursty Features in Text Streams
Bursty features in text streams are very useful in many text mining applications. Most existing studies detect bursty features based purely on term frequency changes without taking into account the semantic contexts of terms, and as a result the detected bursty features may not always be interesting or easy to interpret. In this paper we propose to model the contexts of bursty features using a language modeling approach. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of a stream of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. ? 2010 ACM.EI
Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach
The ever-growing number of people using Twitter makes it a valuable source of
timely information. However, detecting events in Twitter is a difficult task,
because tweets that report interesting events are overwhelmed by a large volume
of tweets on unrelated topics. Existing methods focus on the textual content of
tweets and ignore the social aspect of Twitter. In this paper we propose MABED
(i.e. mention-anomaly-based event detection), a novel statistical method that
relies solely on tweets and leverages the creation frequency of dynamic links
(i.e. mentions) that users insert in tweets to detect significant events and
estimate the magnitude of their impact over the crowd. MABED also differs from
the literature in that it dynamically estimates the period of time during which
each event is discussed, rather than assuming a predefined fixed duration for
all events. The experiments we conducted on both English and French Twitter
data show that the mention-anomaly-based approach leads to more accurate event
detection and improved robustness in presence of noisy Twitter content.
Qualitatively speaking, we find that MABED helps with the interpretation of
detected events by providing clear textual descriptions and precise temporal
descriptions. We also show how MABED can help understanding users' interest.
Furthermore, we describe three visualizations designed to favor an efficient
exploration of the detected events.Comment: 17 page
Mapping Topics and Topic Bursts in PNAS
Scientific research is highly dynamic. New areas of science continually
evolve;others gain or lose importance, merge or split. Due to the steady
increase in the number of scientific publications it is hard to keep an
overview of the structure and dynamic development of one's own field of
science, much less all scientific domains. However, knowledge of hot topics,
emergent research frontiers, or change of focus in certain areas is a critical
component of resource allocation decisions in research labs, governmental
institutions, and corporations. This paper demonstrates the utilization of
Kleinberg's burst detection algorithm, co-word occurrence analysis, and graph
layout techniques to generate maps that support the identification of major
research topics and trends. The approach was applied to analyze and map the
complete set of papers published in the Proceedings of the National Academy of
Sciences (PNAS) in the years 1982-2001. Six domain experts examined and
commented on the resulting maps in an attempt to reconstruct the evolution of
major research areas covered by PNAS
SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects
With the proliferation of mobile devices and location-based services,
continuous generation of massive volume of streaming spatial objects (i.e.,
geo-tagged data) opens up new opportunities to address real-world problems by
analyzing them. In this paper, we present a novel continuous bursty region
detection problem that aims to continuously detect a bursty region of a given
size in a specified geographical area from a stream of spatial objects.
Specifically, a bursty region shows maximum spike in the number of spatial
objects in a given time window. The problem is useful in addressing several
real-world challenges such as surge pricing problem in online transportation
and disease outbreak detection. To solve the problem, we propose an exact
solution and two approximate solutions, and the approximation ratio is
in terms of the burst score, where is a parameter
to control the burst score. We further extend these solutions to support
detection of top- bursty regions. Extensive experiments with real-world data
are conducted to demonstrate the efficiency and effectiveness of our solutions
Precursors and Laggards: An Analysis of Semantic Temporal Relationships on a Blog Network
We explore the hypothesis that it is possible to obtain information about the
dynamics of a blog network by analysing the temporal relationships between
blogs at a semantic level, and that this type of analysis adds to the knowledge
that can be extracted by studying the network only at the structural level of
URL links. We present an algorithm to automatically detect fine-grained
discussion topics, characterized by n-grams and time intervals. We then propose
a probabilistic model to estimate the temporal relationships that blogs have
with one another. We define the precursor score of blog A in relation to blog B
as the probability that A enters a new topic before B, discounting the effect
created by asymmetric posting rates. Network-level metrics of precursor and
laggard behavior are derived from these dyadic precursor score estimations.
This model is used to analyze a network of French political blogs. The scores
are compared to traditional link degree metrics. We obtain insights into the
dynamics of topic participation on this network, as well as the relationship
between precursor/laggard and linking behaviors. We validate and analyze
results with the help of an expert on the French blogosphere. Finally, we
propose possible applications to the improvement of search engine ranking
algorithms
Precursors and Laggards: An Analysis of Semantic Temporal Relationships on a Blog Network
We explore the hypothesis that it is possible to obtain information about the
dynamics of a blog network by analysing the temporal relationships between
blogs at a semantic level, and that this type of analysis adds to the knowledge
that can be extracted by studying the network only at the structural level of
URL links. We present an algorithm to automatically detect fine-grained
discussion topics, characterized by n-grams and time intervals. We then propose
a probabilistic model to estimate the temporal relationships that blogs have
with one another. We define the precursor score of blog A in relation to blog B
as the probability that A enters a new topic before B, discounting the effect
created by asymmetric posting rates. Network-level metrics of precursor and
laggard behavior are derived from these dyadic precursor score estimations.
This model is used to analyze a network of French political blogs. The scores
are compared to traditional link degree metrics. We obtain insights into the
dynamics of topic participation on this network, as well as the relationship
between precursor/laggard and linking behaviors. We validate and analyze
results with the help of an expert on the French blogosphere. Finally, we
propose possible applications to the improvement of search engine ranking
algorithms
- …