594 research outputs found
A Web Aggregation Approach for Distributed Randomized PageRank Algorithms
The PageRank algorithm employed at Google assigns a measure of importance to
each web page for rankings in search results. In our recent papers, we have
proposed a distributed randomized approach for this algorithm, where web pages
are treated as agents computing their own PageRank by communicating with linked
pages. This paper builds upon this approach to reduce the computation and
communication loads for the algorithms. In particular, we develop a method to
systematically aggregate the web pages into groups by exploiting the sparsity
inherent in the web. For each group, an aggregated PageRank value is computed,
which can then be distributed among the group members. We provide a distributed
update scheme for the aggregated PageRank along with an analysis on its
convergence properties. The method is especially motivated by results on
singular perturbation techniques for large-scale Markov chains and multi-agent
consensus.Comment: To appear in the IEEE Transactions on Automatic Control, 201
Ergodic Randomized Algorithms and Dynamics over Networks
Algorithms and dynamics over networks often involve randomization, and
randomization may result in oscillating dynamics which fail to converge in a
deterministic sense. In this paper, we observe this undesired feature in three
applications, in which the dynamics is the randomized asynchronous counterpart
of a well-behaved synchronous one. These three applications are network
localization, PageRank computation, and opinion dynamics. Motivated by their
formal similarity, we show the following general fact, under the assumptions of
independence across time and linearities of the updates: if the expected
dynamics is stable and converges to the same limit of the original synchronous
dynamics, then the oscillations are ergodic and the desired limit can be
locally recovered via time-averaging.Comment: 11 pages; submitted for publication. revised version with fixed
technical flaw and updated reference
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
FrogWild! -- Fast PageRank Approximations on Graph Engines
We propose FrogWild, a novel algorithm for fast approximation of high
PageRank vertices, geared towards reducing network costs of running traditional
PageRank algorithms. Our algorithm can be seen as a quantized version of power
iteration that performs multiple parallel random walks over a directed graph.
One important innovation is that we introduce a modification to the GraphLab
framework that only partially synchronizes mirror vertices. This partial
synchronization vastly reduces the network traffic generated by traditional
PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank.
On the other hand, this partial synchronization also creates dependencies
between the random walks used to estimate PageRank. Our main theoretical
innovation is the analysis of the correlations introduced by this partial
synchronization process and a bound establishing that our approximation is
close to the true PageRank vector.
We implement our algorithm in GraphLab and compare it against the default
PageRank implementation. We show that our algorithm is very fast, performing
each iteration in less than one second on the Twitter graph and can be up to 7x
faster compared to the standard GraphLab PageRank implementation
- …