182 research outputs found

    Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation

    Full text link
    A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor c>1c>1, to output a set SS of nodes that with high probability contains all nodes of PageRank at least Δ\Delta, and no node of PageRank smaller than Δ/c\Delta/c. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity O~(n/Δ)\tilde{O}(n/\Delta) on networks with nn nodes. We show that any algorithm for solving this problem must have runtime of Ω(n/Δ){\Omega}(n/\Delta), rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {\em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ϵ\epsilon. At a cost propositional to 1/ϵ1/\epsilon, the query will return a list of O(1/ϵ)O(1/\epsilon) entries and their indices that provide an ϵ\epsilon-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least Δ\Delta, and omits any column whose sum is less than Δ/c\Delta/c. Our multi-scale sampling scheme solves this problem with cost O~(n/Δ)\tilde{O}(n/\Delta), while traditional sampling algorithms would take time Θ((n/Δ)2)\Theta((n/\Delta)^2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {\sc SignificantPageRanks} problem.Comment: Accepted to Internet Mathematics journal for publication. An extended abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time Algorithm for PageRank Computations

    Bidirectional PageRank Estimation: From Average-Case to Worst-Case

    Full text link
    We present a new algorithm for estimating the Personalized PageRank (PPR) between a source and target node on undirected graphs, with sublinear running-time guarantees over the worst-case choice of source and target nodes. Our work builds on a recent line of work on bidirectional estimators for PPR, which obtained sublinear running-time guarantees but in an average-case sense, for a uniformly random choice of target node. Crucially, we show how the reversibility of random walks on undirected networks can be exploited to convert average-case to worst-case guarantees. While past bidirectional methods combine forward random walks with reverse local pushes, our algorithm combines forward local pushes with reverse random walks. We also discuss how to modify our methods to estimate random-walk probabilities for any length distribution, thereby obtaining fast algorithms for estimating general graph diffusions, including the heat kernel, on undirected networks.Comment: Workshop on Algorithms and Models for the Web-Graph (WAW) 201

    Quick Detection of High-degree Entities in Large Directed Networks

    Get PDF
    In this paper, we address the problem of quick detection of high-degree entities in large online social networks. Practical importance of this problem is attested by a large number of companies that continuously collect and update statistics about popular entities, usually using the degree of an entity as an approximation of its popularity. We suggest a simple, efficient, and easy to implement two-stage randomized algorithm that provides highly accurate solutions for this problem. For instance, our algorithm needs only one thousand API requests in order to find the top-100 most followed users in Twitter, a network with approximately a billion of registered users, with more than 90% precision. Our algorithm significantly outperforms existing methods and serves many different purposes, such as finding the most popular users or the most popular interest groups in social networks. An important contribution of this work is the analysis of the proposed algorithm using Extreme Value Theory -- a branch of probability that studies extreme events and properties of largest order statistics in random samples. Using this theory, we derive an accurate prediction for the algorithm's performance and show that the number of API requests for finding the top-k most popular entities is sublinear in the number of entities. Moreover, we formally show that the high variability among the entities, expressed through heavy-tailed distributions, is the reason for the algorithm's efficiency. We quantify this phenomenon in a rigorous mathematical way

    Sublinear algorithms for local graph centrality estimation

    Get PDF
    We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, that we apply to the PageRank and Heat Kernel centralities, for building a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of mm arcs, with probability (1δ)(1-\delta) computes a multiplicative (1±ϵ)(1\pm\epsilon)-approximation of its score by examining only O~(min(m2/3Δ1/3d2/3,m4/5d3/5))\tilde{O}(\min(m^{2/3} \Delta^{1/3} d^{-2/3},\, m^{4/5} d^{-3/5})) nodes/arcs, where Δ\Delta and dd are respectively the maximum and average outdegree of the graph (omitting for readability poly(ϵ1)\operatorname{poly}(\epsilon^{-1}) and polylog(δ1)\operatorname{polylog}(\delta^{-1}) factors). A similar bound holds for computational complexity. We also prove a lower bound of Ω(min(m1/2Δ1/2d1/2,m2/3d1/3))\Omega(\min(m^{1/2} \Delta^{1/2} d^{-1/2}, \, m^{2/3} d^{-1/3})) for both query complexity and computational complexity. Moreover, our technique yields a O~(n2/3)\tilde{O}(n^{2/3}) query complexity algorithm for the graph access model of [Brautbar et al., 2010], widely used in social network mining; we show this algorithm is optimal up to a sublogarithmic factor. These are the first algorithms yielding worst-case sublinear bounds for general directed graphs and any choice of the target node.Comment: 29 pages, 1 figur

    Fast Local Computation Algorithms

    Full text link
    For input xx, let F(x)F(x) denote the set of outputs that are the "legal" answers for a computational problem FF. Suppose xx and members of F(x)F(x) are so large that there is not time to read them in their entirety. We propose a model of {\em local computation algorithms} which for a given input xx, support queries by a user to values of specified locations yiy_i in a legal output yF(x)y \in F(x). When more than one legal output yy exists for a given xx, the local computation algorithm should output in a way that is consistent with at least one such yy. Local computation algorithms are intended to distill the common features of several concepts that have appeared in various algorithmic subfields, including local distributed computation, local algorithms, locally decodable codes, and local reconstruction. We develop a technique, based on known constructions of small sample spaces of kk-wise independent random variables and Beck's analysis in his algorithmic approach to the Lov{\'{a}}sz Local Lemma, which under certain conditions can be applied to construct local computation algorithms that run in {\em polylogarithmic} time and space. We apply this technique to maximal independent set computations, scheduling radio network broadcasts, hypergraph coloring and satisfying kk-SAT formulas.Comment: A preliminary version of this paper appeared in ICS 2011, pp. 223-23
    corecore