Search CORE

182 research outputs found

Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation

Author: Borgs Christian
Brautbar Michael
Chayes Jennifer
Teng Shang-Hua
Publication venue
Publication date: 01/01/1202
Field of study

A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor

c>1

, to output a set

S

of nodes that with high probability contains all nodes of PageRank at least

\Delta

, and no node of PageRank smaller than

\Delta/c

. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity

\tilde{O}(n/\Delta)

on networks with

n

nodes. We show that any algorithm for solving this problem must have runtime of

{\Omega}(n/\Delta)

, rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {\em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter

\epsilon

. At a cost propositional to

1/\epsilon

, the query will return a list of

O(1/\epsilon)

entries and their indices that provide an

\epsilon

-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least

\Delta

, and omits any column whose sum is less than

\Delta/c

. Our multi-scale sampling scheme solves this problem with cost

\tilde{O}(n/\Delta)

, while traditional sampling algorithms would take time

\Theta((n/\Delta)^2)

. Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {\sc SignificantPageRanks} problem.Comment: Accepted to Internet Mathematics journal for publication. An extended abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time Algorithm for PageRank Computations

arXiv.org e-Print Archive

CiteSeerX

Bidirectional PageRank Estimation: From Average-Case to Worst-Case

Author: B Bahmani
C Borgs
D Dubhashi
F Chung
K Avrachenkov
K Avrachenkov
O Goldreich
R Andersen
R Lempel
V Grolmusz
Publication venue
Publication date: 14/12/2015
Field of study

We present a new algorithm for estimating the Personalized PageRank (PPR) between a source and target node on undirected graphs, with sublinear running-time guarantees over the worst-case choice of source and target nodes. Our work builds on a recent line of work on bidirectional estimators for PPR, which obtained sublinear running-time guarantees but in an average-case sense, for a uniformly random choice of target node. Crucially, we show how the reversibility of random walks on undirected networks can be exploited to convert average-case to worst-case guarantees. While past bidirectional methods combine forward random walks with reverse local pushes, our algorithm combines forward local pushes with reverse random walks. We also discuss how to modify our methods to estimate random-walk probabilities for any length distribution, thereby obtaining fast algorithms for estimating general graph diffusions, including the heat kernel, on undirected networks.Comment: Workshop on Algorithms and Models for the Web-Graph (WAW) 201

arXiv.org e-Print Archive

Crossref

Quick Detection of High-degree Entities in Large Directed Networks

Author: Avrachenkov Konstantin
Litvak Nelly
Prokhorenkova Liudmila Ostroumova
Suyargulova Eugenia
Publication venue
Publication date: 23/10/2014
Field of study

In this paper, we address the problem of quick detection of high-degree entities in large online social networks. Practical importance of this problem is attested by a large number of companies that continuously collect and update statistics about popular entities, usually using the degree of an entity as an approximation of its popularity. We suggest a simple, efficient, and easy to implement two-stage randomized algorithm that provides highly accurate solutions for this problem. For instance, our algorithm needs only one thousand API requests in order to find the top-100 most followed users in Twitter, a network with approximately a billion of registered users, with more than 90% precision. Our algorithm significantly outperforms existing methods and serves many different purposes, such as finding the most popular users or the most popular interest groups in social networks. An important contribution of this work is the analysis of the proposed algorithm using Extreme Value Theory -- a branch of probability that studies extreme events and properties of largest order statistics in random samples. Using this theory, we derive an accurate prediction for the algorithm's performance and show that the number of API requests for finding the top-k most popular entities is sublinear in the number of entities. Moreover, we formally show that the high variability among the entities, expressed through heavy-tailed distributions, is the reason for the algorithm's efficiency. We quantify this phenomenon in a rigorous mathematical way

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

University of Twente Research Information

Sublinear algorithms for local graph centrality estimation

Author: Bressan Marco
Peserico Enoch
Pretto Luca
Publication venue
Publication date: 01/01/2018
Field of study

We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, that we apply to the PageRank and Heat Kernel centralities, for building a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of

m

arcs, with probability

(1-\delta)

computes a multiplicative

(1\pm\epsilon)

-approximation of its score by examining only

\tilde{O}(\min(m^{2/3} \Delta^{1/3} d^{-2/3},\, m^{4/5} d^{-3/5}))

nodes/arcs, where

\Delta

and

d

are respectively the maximum and average outdegree of the graph (omitting for readability

\operatorname{poly}(\epsilon^{-1})

and

\operatorname{polylog}(\delta^{-1})

factors). A similar bound holds for computational complexity. We also prove a lower bound of

\Omega(\min(m^{1/2} \Delta^{1/2} d^{-1/2}, \, m^{2/3} d^{-1/3}))

for both query complexity and computational complexity. Moreover, our technique yields a

\tilde{O}(n^{2/3})

query complexity algorithm for the graph access model of [Brautbar et al., 2010], widely used in social network mining; we show this algorithm is optimal up to a sublogarithmic factor. These are the first algorithms yielding worst-case sublinear bounds for general directed graphs and any choice of the target node.Comment: 29 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova

Fast Local Computation Algorithms

Author: Rubinfeld Ronitt
Tamir Gil
Vardi Shai
Xie Ning
Publication venue
Publication date: 01/01/2011
Field of study

For input

x

, let

F(x)

denote the set of outputs that are the "legal" answers for a computational problem

F

. Suppose

x

and members of

F(x)

are so large that there is not time to read them in their entirety. We propose a model of {\em local computation algorithms} which for a given input

x

, support queries by a user to values of specified locations

y_i

in a legal output

y \in F(x)

. When more than one legal output

y

exists for a given

x

, the local computation algorithm should output in a way that is consistent with at least one such

y

. Local computation algorithms are intended to distill the common features of several concepts that have appeared in various algorithmic subfields, including local distributed computation, local algorithms, locally decodable codes, and local reconstruction. We develop a technique, based on known constructions of small sample spaces of

k

-wise independent random variables and Beck's analysis in his algorithmic approach to the Lov{\'{a}}sz Local Lemma, which under certain conditions can be applied to construct local computation algorithms that run in {\em polylogarithmic} time and space. We apply this technique to maximal independent set computations, scheduling radio network broadcasts, hypergraph coloring and satisfying

k

-SAT formulas.Comment: A preliminary version of this paper appeared in ICS 2011, pp. 223-23

arXiv.org e-Print Archive

CiteSeerX