Search CORE

4 research outputs found

Walking in the Cloud: Parallel SimRank at Scale

Author: Cheng J
Cheng RCK
Fang Y
Li Z
Liu Q
Lui JCS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2015
Field of study

Accepted Posterpublished_or_final_versio

HKU Scholars Hub

Exact Single-Source SimRank Computation on Large Graphs

Author: Du Xiaoyong
Wang Hanzhi
Wei Zhewei
Wen Ji-Rong
Yuan Ye
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2020
Field of study

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-

k

SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than

10^6

nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-

k

SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

arXiv.org e-Print Archive

Crossref

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Author: Du Xiaoyong
He Xiaodong
Liu Yu
Wang Sibo
Wei Zhewei
Wen Ji-Rong
Xiao Xiaokui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2019
Field of study

{\it SimRank} is a classic measure of the similarities of nodes in a graph. Given a node

u

in graph

G =(V, E)

, a {\em single-source SimRank query} returns the SimRank similarities

s(u, v)

between node

u

and each node

v \in V

. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes

n

, which renders them inapplicable for real-time and interactive analysis. { This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size

O(m)

, where

m

is the number of edges in the graph, and guarantees a query time that depends on the {\em reverse PageRank} distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph.} Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.Comment: ACM SIGMOD 201

arXiv.org e-Print Archive

Crossref