163 research outputs found
Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme
{\em Personalized PageRank (PPR)} stands as a fundamental proximity measure
in graph mining. Since computing an exact SSPPR query answer is prohibitive,
most existing solutions turn to approximate queries with guarantees. The
state-of-the-art solutions for approximate SSPPR queries are index-based and
mainly focus on static graphs, while real-world graphs are usually dynamically
changing. However, existing index-update schemes can not achieve a sub-linear
update time. Motivated by this, we present an efficient indexing scheme to
maintain indexed random walks in expected time after each graph update.
To reduce the space consumption, we further propose a new sampling scheme to
remove the auxiliary data structure for vertices while still supporting
index update cost on evolving graphs. Extensive experiments show that our
update scheme achieves orders of magnitude speed-up on update performance over
existing index-based dynamic schemes without sacrificing the query efficiency
Red Light Green Light Method for Solving Large Markov Chains
Discrete-time discrete-state finite Markov chains are versatile mathematical
models for a wide range of real-life stochastic processes. One of most common
tasks in studies of Markov chains is computation of the stationary
distribution. Without loss of generality, and drawing our motivation from
applications to large networks, we interpret this problem as one of computing
the stationary distribution of a random walk on a graph. We propose a new
controlled, easily distributed algorithm for this task, briefly summarized as
follows: at the beginning, each node receives a fixed amount of cash (positive
or negative), and at each iteration, some nodes receive `green light' to
distribute their wealth or debt proportionally to the transition probabilities
of the Markov chain; the stationary probability of a node is computed as a
ratio of the cash distributed by this a node to the total cash distributed by
all nodes together. Our method includes as special cases a wide range of known,
very different, and previously disconnected methods including power iterations,
Gauss-Southwell, and online distributed algorithms. We prove exponential
convergence of our method, demonstrate its high efficiency, and derive
scheduling strategies for the green-light, that achieve convergence rate faster
than state-of-the-art algorithms
Quick Detection of High-degree Entities in Large Directed Networks
In this paper, we address the problem of quick detection of high-degree
entities in large online social networks. Practical importance of this problem
is attested by a large number of companies that continuously collect and update
statistics about popular entities, usually using the degree of an entity as an
approximation of its popularity. We suggest a simple, efficient, and easy to
implement two-stage randomized algorithm that provides highly accurate
solutions for this problem. For instance, our algorithm needs only one thousand
API requests in order to find the top-100 most followed users in Twitter, a
network with approximately a billion of registered users, with more than 90%
precision. Our algorithm significantly outperforms existing methods and serves
many different purposes, such as finding the most popular users or the most
popular interest groups in social networks. An important contribution of this
work is the analysis of the proposed algorithm using Extreme Value Theory -- a
branch of probability that studies extreme events and properties of largest
order statistics in random samples. Using this theory, we derive an accurate
prediction for the algorithm's performance and show that the number of API
requests for finding the top-k most popular entities is sublinear in the number
of entities. Moreover, we formally show that the high variability among the
entities, expressed through heavy-tailed distributions, is the reason for the
algorithm's efficiency. We quantify this phenomenon in a rigorous mathematical
way
- …