Search CORE

827 research outputs found

Fast Distributed PageRank Computation

Author: Andersen
Anisur Rahaman Molla
Atish Das Sarma
Avrachenkov
Bahmani
Bahmani
Berkhin
Bianchini
Brin
Cook
Das Sarma
Das Sarma
Das Sarma
Eli Upfal
Gopal Pandurangan
Grolmusz
Iván
Langville
Mitzenmacher
Page
Perra
Sankaralingam
Shi
Wang
Publication venue: 'Elsevier BV'
Publication date: 25/11/2015
Field of study

Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where

n

is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

arXiv.org e-Print Archive

Crossref

FrogWild! -- Fast PageRank Approximations on Graph Engines

Author: Borokhovich Michael
Caramanis Constantine
Dimakis Alexandros G.
Mitliagkas Ioannis
Publication venue
Publication date: 14/02/2015
Field of study

We propose FrogWild, a novel algorithm for fast approximation of high PageRank vertices, geared towards reducing network costs of running traditional PageRank algorithms. Our algorithm can be seen as a quantized version of power iteration that performs multiple parallel random walks over a directed graph. One important innovation is that we introduce a modification to the GraphLab framework that only partially synchronizes mirror vertices. This partial synchronization vastly reduces the network traffic generated by traditional PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank. On the other hand, this partial synchronization also creates dependencies between the random walks used to estimate PageRank. Our main theoretical innovation is the analysis of the correlations introduced by this partial synchronization process and a bound establishing that our approximation is close to the true PageRank vector. We implement our algorithm in GraphLab and compare it against the default PageRank implementation. We show that our algorithm is very fast, performing each iteration in less than one second on the Twitter graph and can be up to 7x faster compared to the standard GraphLab PageRank implementation

arXiv.org e-Print Archive

CiteSeerX

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX

Asynchronous iterative solution for dominant eigenvectors with applications in performance modelling and PageRank

Author: de Jager Douglas Vincent
de Jager Douglas Vincent
Publication venue
Publication date: 01/01/2009
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Supplier Ranking System and Its Effect on the Reliability of the Supply Chain

Author: Rabib Farshad
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2020
Field of study

Today, due to the growing use of social media and an increase in the number of A HITS with a solution in PageRank (Massimo, 2011) sharing their opinions globally, customers can review products and services in many novel ways. However, since most reviewers lack in-depth technical knowledge, the true picture concerning product quality remains unclear. Furthermore, although product defects may come from the supplier side, making it responsible for repair cost, it is ultimately the manufacturer whose name is damaged when such defects are revealed. In this context, we need to revisit the cost vs. quality equations. Observations of customer behavior towards brand name and reputation suggest that, contrary to the currently dominant model in production where manufacturers are expected to control only Tier 1 supplier and make it responsible for all higher tiers, manufacturers should also have a better hold on the entire supply chain. Said differently, while the current system considers all parts in Tier 1 as equally important, it underestimates the importance of the impact of each piece on the final product. Another flaw of the current system is that, by commonizing the pieces in several different products, such as different care models of the same manufacturer to reduce the cost, only the supplier of the most common parts will be considered essential and thus get the most attention during quality control. To address the aforementioned concerns, in the present study, we created a parts/supplier ranking algorithm and implemented it into our supply chain system. Upon ranking all suppliers and parts, we calculated the minimum number of the elements, from Tier 1 to Tier 4, that have to be checked in our supply chain. In doing so, we prioritized keeping the cost as low as possible with most inferior possible defects

University of Tennessee, Knoxville: Trace