827 research outputs found

    Fast Distributed PageRank Computation

    Full text link
    Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where nn is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

    FrogWild! -- Fast PageRank Approximations on Graph Engines

    Full text link
    We propose FrogWild, a novel algorithm for fast approximation of high PageRank vertices, geared towards reducing network costs of running traditional PageRank algorithms. Our algorithm can be seen as a quantized version of power iteration that performs multiple parallel random walks over a directed graph. One important innovation is that we introduce a modification to the GraphLab framework that only partially synchronizes mirror vertices. This partial synchronization vastly reduces the network traffic generated by traditional PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank. On the other hand, this partial synchronization also creates dependencies between the random walks used to estimate PageRank. Our main theoretical innovation is the analysis of the correlations introduced by this partial synchronization process and a bound establishing that our approximation is close to the true PageRank vector. We implement our algorithm in GraphLab and compare it against the default PageRank implementation. We show that our algorithm is very fast, performing each iteration in less than one second on the Twitter graph and can be up to 7x faster compared to the standard GraphLab PageRank implementation

    GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

    Full text link
    From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

    Supplier Ranking System and Its Effect on the Reliability of the Supply Chain

    Get PDF
    Today, due to the growing use of social media and an increase in the number of A HITS with a solution in PageRank (Massimo, 2011) sharing their opinions globally, customers can review products and services in many novel ways. However, since most reviewers lack in-depth technical knowledge, the true picture concerning product quality remains unclear. Furthermore, although product defects may come from the supplier side, making it responsible for repair cost, it is ultimately the manufacturer whose name is damaged when such defects are revealed. In this context, we need to revisit the cost vs. quality equations. Observations of customer behavior towards brand name and reputation suggest that, contrary to the currently dominant model in production where manufacturers are expected to control only Tier 1 supplier and make it responsible for all higher tiers, manufacturers should also have a better hold on the entire supply chain. Said differently, while the current system considers all parts in Tier 1 as equally important, it underestimates the importance of the impact of each piece on the final product. Another flaw of the current system is that, by commonizing the pieces in several different products, such as different care models of the same manufacturer to reduce the cost, only the supplier of the most common parts will be considered essential and thus get the most attention during quality control. To address the aforementioned concerns, in the present study, we created a parts/supplier ranking algorithm and implemented it into our supply chain system. Upon ranking all suppliers and parts, we calculated the minimum number of the elements, from Tier 1 to Tier 4, that have to be checked in our supply chain. In doing so, we prioritized keeping the cost as low as possible with most inferior possible defects
    • …
    corecore