10,990 research outputs found
A Semantic Graph-Based Approach for Mining Common Topics From Multiple Asynchronous Text Streams
In the age of Web 2.0, a substantial amount of unstructured
content are distributed through multiple text streams in an
asynchronous fashion, which makes it increasingly difficult
to glean and distill useful information. An effective way to
explore the information in text streams is topic modelling,
which can further facilitate other applications such as search,
information browsing, and pattern mining. In this paper, we
propose a semantic graph based topic modelling approach
for structuring asynchronous text streams. Our model in-
tegrates topic mining and time synchronization, two core
modules for addressing the problem, into a unified model.
Specifically, for handling the lexical gap issues, we use global
semantic graphs of each timestamp for capturing the hid-
den interaction among entities from all the text streams.
For dealing with the sources asynchronism problem, local
semantic graphs are employed to discover similar topics of
different entities that can be potentially separated by time
gaps. Our experiment on two real-world datasets shows that
the proposed model significantly outperforms the existing
ones
Multi-layer model for the web graph
This paper studies stochastic graph models of the WebGraph. We present a new model that describes the WebGraph as an ensemble of different regions generated by independent stochastic processes (in the spirit of a recent paper by Dill et al. [VLDB 2001]). Models such as the Copying Model [17] and Evolving Networks Model [3] are simulated and compared on several relevant measures such as degree and clique distribution
A rule dynamics approach to event detection in Twitter with its application to sports and politics
The increasing popularity of Twitter as social network tool for opinion expression as well as informa- tion retrieval has resulted in the need to derive computational means to detect and track relevant top- ics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we ap- ply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events
AUGUR: Forecasting the Emergence of New Research Topics
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature presents several approaches to identifying the emergence of new research topics, which rely on the assumption that the topic is already exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. We address this issue by introducing Augur, a novel approach to the early detection of research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new community detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 interval and outperformed four alternative approaches in terms of both precision and recall
Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation
We study a problem of quick detection of top-k Personalized PageRank lists.
This problem has a number of important applications such as finding local cuts
in large graphs, estimation of similarity distance and name disambiguation. In
particular, we apply our results to construct efficient algorithms for the
person name disambiguation problem. We argue that when finding top-k
Personalized PageRank lists two observations are important. Firstly, it is
crucial that we detect fast the top-k most important neighbours of a node,
while the exact order in the top-k list as well as the exact values of PageRank
are by far not so crucial. Secondly, a little number of wrong elements in top-k
lists do not really degrade the quality of top-k lists, but it can lead to
significant computational saving. Based on these two key observations we
propose Monte Carlo methods for fast detection of top-k Personalized PageRank
lists. We provide performance evaluation of the proposed methods and supply
stopping criteria. Then, we apply the methods to the person name disambiguation
problem. The developed algorithm for the person name disambiguation problem has
achieved the second place in the WePS 2010 competition
- âŠ