Search CORE

1,393 research outputs found

Distributed Algorithms on Exact Personalized PageRank

Author: Cao Xin
Cong Gao
Guo Tao
Lin Xuemin
Lu Jiaheng
Publication venue: ACM
Publication date: 01/01/2017
Field of study

As one of the most well known graph computation problems, Personalized PageRank is an effective approach for computing the similarity score between two nodes, and it has been widely used in various applications, such as link prediction and recommendation. Due to the high computational cost and space cost of computing the exact Personalized PageRank Vector (PPV), most existing studies compute PPV approximately. In this paper, we propose novel and efficient distributed algorithms that compute PPV exactly based on graph partitioning on a general coordinator-based share-nothing distributed computing platform. Our algorithms takes three aspects into account: the load balance, the communication cost, and the computation cost of each machine. The proposed algorithms only require one time of communication between each machine and the coordinator at query time. The communication cost is bounded, and the work load on each machine is balanced. Comprehensive experiments conducted on five real datasets demonstrate the efficiency and the scalability of our proposed methods.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

DR-NTU (Digital Repository of NTU)

Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

Author: Danil Nemirovsky
Elena Smirnova
Konstantin Avrachenkov
Marina Sokol
Nelly Litvak
Thème Com
Publication venue
Publication date: 01/01/2010
Field of study

We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

University of Twente Research Information

Exact Single-Source SimRank Computation on Large Graphs

Author: Du Xiaoyong
Wang Hanzhi
Wei Zhewei
Wen Ji-Rong
Yuan Ye
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2020
Field of study

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-

k

SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than

10^6

nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-

k

SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

arXiv.org e-Print Archive

Crossref

Fast matrix computations for pair-wise and column-wise commute times and Katz scores

Author: Andersen [Andersen et al. 06] Reid
Boldi [Boldi et al. 11] Paolo
Chung [Chung et al. 03] Fan
Davis [Davis and Rabinowitz 84] P. J.
Golub [Golub and Meurant 10] Gene H.
Lanczos [Lanczos 50] Cornelius
Lanczos [Lanczos 53] Cornelius
Liben-Nowell [Liben-Nowell and Kleinberg 03] David
McSherry [McSherry 05] Frank
Mihail [Mihail and Papadimitriou 02] Milena
Varga [Varga 62] R. S.
Publication venue: 'Informa UK Limited'
Publication date: 19/04/2011
Field of study

We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and Katz scores from a node to all other nodes in the graph. Here, our approach for the commute times is based on a variation of the conjugate gradient algorithm, and it provides an estimate of all the diagonals of the inverse of a matrix. Our technique for the Katz scores is based on exploiting an empirical localization property of the Katz matrix. We adopt algorithms used for personalized PageRank computing to these Katz scores and theoretically show that this approach is convergent. We evaluate these methods on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our results show that our pair-wise commute time method and column-wise Katz algorithm both have attractive theoretical properties and empirical performance.Comment: 35 pages, journal version of http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for publication. Please see http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for supplemental code

arXiv.org e-Print Archive

Crossref