Search CORE

55 research outputs found

FrogWild! -- Fast PageRank Approximations on Graph Engines

Author: Borokhovich Michael
Caramanis Constantine
Dimakis Alexandros G.
Mitliagkas Ioannis
Publication venue
Publication date: 14/02/2015
Field of study

We propose FrogWild, a novel algorithm for fast approximation of high PageRank vertices, geared towards reducing network costs of running traditional PageRank algorithms. Our algorithm can be seen as a quantized version of power iteration that performs multiple parallel random walks over a directed graph. One important innovation is that we introduce a modification to the GraphLab framework that only partially synchronizes mirror vertices. This partial synchronization vastly reduces the network traffic generated by traditional PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank. On the other hand, this partial synchronization also creates dependencies between the random walks used to estimate PageRank. Our main theoretical innovation is the analysis of the correlations introduced by this partial synchronization process and a bound establishing that our approximation is close to the true PageRank vector. We implement our algorithm in GraphLab and compare it against the default PageRank implementation. We show that our algorithm is very fast, performing each iteration in less than one second on the Twitter graph and can be up to 7x faster compared to the standard GraphLab PageRank implementation

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Resource-constrained, scalable learning

Author: Mitliagkas Ioannis
Publication venue
Publication date: 04/11/2015
Field of study

textOur unprecedented capacity for data generation and acquisition often reaches the limits of our data storage capabilities. Situations when data are generated faster or at a greater volume than can be stored demand a streaming approach. Memory is an even more valuable resource. Algorithms that use more memory than necessary can pose bottlenecks when processing high-dimensional data and the need for memory-efficient algorithms is especially stressed in the streaming setting. Finally, network along with storage, emerge as the critical bottlenecks in the context of distributed computation. These computational constraints spell out a demand for efficient tools that guarantee a solution in the face of limited resources, even when the data is very noisy or highly incomplete. For the first part of this dissertation, we present our work on streaming, memory-limited Principal Component Analysis (PCA). Therein, we give the first convergence guarantees for an algorithm that solves PCA in the single-pass streaming setting. Then, we discuss the distinct challenges that arise when the received samples are overwhelmingly incomplete and present an algorithm and analysis that deals with this issue. Finally, we give a set of extensive experiment results that showcase the practical merits of our algorithm over the state of the art. The need for heavy network communication arises as the bottleneck when dealing with cluster computation. In that paradigm, a set of worker nodes are connected over the network to produce a cluster with improved computational and storage capacities. This comes with an increased need for communication across the network. In the last part of this work, we consider the problem of PageRank on graph engines. Therein, we make changes to GraphLab, a state-of-the-art platform for distributed graph computation, in a way that leads to a 7x-10x speedup for certain PageRank approximation tasks. Accompanying analysis supports the behaviour we see in our experiments.Electrical and Computer Engineerin

Texas ScholarWorks