Search CORE

137 research outputs found

Personalized PageRank with Node-dependent Restart

Author: F Fouss
K Avrachenkov
K Avrachenkov
K Avrachenkov
P Chen
PG Constantine
PG Constantine
X Liu
Publication venue
Publication date: 01/01/2014
Field of study

Personalized PageRank is an algorithm to classify the improtance of web pages on a user-dependent basis. We introduce two generalizations of Personalized PageRank with node-dependent restart. The first generalization is based on the proportion of visits to nodes before the restart, whereas the second generalization is based on the probability of visited node just before the restart. In the original case of constant restart probability, the two measures coincide. We discuss interesting particular cases of restart probabilities and restart distributions. We show that the both generalizations of Personalized PageRank have an elegant expression connecting the so-called direct and reverse Personalized PageRanks that yield a symmetry property of these Personalized PageRanks

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

INRIA a CCSD electronic archive server

Fast Distributed PageRank Computation

Author: Andersen
Anisur Rahaman Molla
Atish Das Sarma
Avrachenkov
Bahmani
Bahmani
Berkhin
Bianchini
Brin
Cook
Das Sarma
Das Sarma
Das Sarma
Eli Upfal
Gopal Pandurangan
Grolmusz
Iván
Langville
Mitzenmacher
Page
Perra
Sankaralingam
Shi
Wang
Publication venue: 'Elsevier BV'
Publication date: 25/11/2015
Field of study

Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where

n

is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

arXiv.org e-Print Archive

Crossref

Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation

Author: Borgs Christian
Brautbar Michael
Chayes Jennifer
Teng Shang-Hua
Publication venue
Publication date: 01/01/1202
Field of study

A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor

c>1

, to output a set

S

of nodes that with high probability contains all nodes of PageRank at least

\Delta

, and no node of PageRank smaller than

\Delta/c

. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity

\tilde{O}(n/\Delta)

on networks with

n

nodes. We show that any algorithm for solving this problem must have runtime of

{\Omega}(n/\Delta)

, rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {\em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter

\epsilon

. At a cost propositional to

1/\epsilon

, the query will return a list of

O(1/\epsilon)

entries and their indices that provide an

\epsilon

-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least

\Delta

, and omits any column whose sum is less than

\Delta/c

. Our multi-scale sampling scheme solves this problem with cost

\tilde{O}(n/\Delta)

, while traditional sampling algorithms would take time

\Theta((n/\Delta)^2)

. Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {\sc SignificantPageRanks} problem.Comment: Accepted to Internet Mathematics journal for publication. An extended abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time Algorithm for PageRank Computations

arXiv.org e-Print Archive

CiteSeerX

Asymptotic analysis for personalized Web search

Author: Litvak N.
Volkovich Y.V.
Publication venue: Department of Applied Mathematics, University of Twente
Publication date: 01/01/2008
Field of study

Personalized PageRank is used in Web search as an importance measure for Web documents. The goal of this paper is to characterize the tail behavior of the PageRank distribution in the Web and other complex networks characterized by power laws. To this end, we model the PageRank as a solution of a stochastic equation

R\stackrel{d}{=}\sum_{i=1}^NA_iR_i+B

, where

R_i

's are distributed as

R

. This equation is inspired by the original definition of the PageRank. In particular,

N

models the number of incoming links of a page, and

B

stays for the user preference. Assuming that

N

B

are heavy-tailed, we employ the theory of regular variation to obtain the asymptotic behavior of

R

under quite general assumptions on the involved random variables. Our theoretical predictions show a good agreement with experimental data

University of Twente Research Information

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Author: Du Xiaoyong
He Xiaodong
Liu Yu
Wang Sibo
Wei Zhewei
Wen Ji-Rong
Xiao Xiaokui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2019
Field of study

{\it SimRank} is a classic measure of the similarities of nodes in a graph. Given a node

u

in graph

G =(V, E)

, a {\em single-source SimRank query} returns the SimRank similarities

s(u, v)

between node

u

and each node

v \in V

. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes

n

, which renders them inapplicable for real-time and interactive analysis. { This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size

O(m)

, where

m

is the number of edges in the graph, and guarantees a query time that depends on the {\em reverse PageRank} distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph.} Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.Comment: ACM SIGMOD 201

arXiv.org e-Print Archive

Crossref

Improved Distortion and Spam Resistance for PageRank

Author: Farach-Colton Lucas
Farach-Colton Martin
Goldberg Leslie Ann
Lapinskas John
Levi Reut
Medina Moti
Mosteiro Miguel
Publication venue
Publication date: 04/11/2019
Field of study

For a directed graph

G = (V,E)

, a ranking function, such as PageRank, provides a way of mapping elements of

V

to non-negative real numbers so that nodes can be ordered. Brin and Page argued that the stationary distribution,

R(G)

, of a random walk on

G

is an effective ranking function for queries on an idealized web graph. However,

R(G)

is not defined for all

G

, and in particular, it is not defined for the real web graph. Thus, they introduced PageRank to approximate

R(G)

for graphs

G

with ergodic random walks while being defined on all graphs. PageRank is defined as a random walk on a graph, where with probability

(1-\epsilon)

, a random out-edge is traversed, and with \emph{reset probability}

\epsilon

the random walk instead restarts at a node selected using a \emph{reset vector}

\hat{r}

. Originally,

\hat{r}

was taken to be uniform on the nodes, and we call this version UPR. In this paper, we introduce graph-theoretic notions of quality for ranking functions, specifically \emph{distortion} and \emph{spam resistance}. We show that UPR has high distortion and low spam resistance and we show how to select an

\hat{r}

that yields low distortion and high spam resistance.Comment: 36 page

arXiv.org e-Print Archive

Identifying Diabetes-Related Important Protein Targets with few Interacting Partners with the PageRank Algorithm

Author: Grolmusz Vince
Publication venue: 'The Royal Society'
Publication date: 01/01/2015
Field of study

Diabetes is a growing concern for the developed nations worldwide. New genomic, metagenomic and gene-technologic approaches may yield considerable results in the next several years in its early diagnosis, or in advances in therapy and management. In this work, we highlight some human proteins that may serve as new targets in the early diagnosis and therapy. With the help of a very successful mathematical tool for network analysis that formed the basis of the early successes of Google(TM), Inc., we analyse the human protein–protein interaction network gained from the IntAct database with a mathematical algorithm. The novelty of our approach is that the new protein targets suggested do not have many interacting partners (so, they are not hubs or super-hubs), so their inhibition or promotion probably will not have serious side effects. We have identified numerous possible protein targets for diabetes therapy and/or management; some of these have been well known for a long time (these validate our method), some of them appeared in the literature in the last 12 months (these show the cutting edge of the algorithm), and the remainder are still unknown to be connected with diabetes, witnessing completely new hits of the method

PubMed Central

ELTE Digital Institutional Repository (EDIT)