Search CORE

9 research outputs found

Efficient simrank-based similarity join over large graphs

Author: Abbassi Z.
Albert Réka
Chan E. P. F.
Cheng J.
Dinur I.
Fan W.
Fogaras D.
Gentle J. E.
Jeh G.
Karakostas G.
Karypis G.
Kessler M. M.
Khan A.
Khot S.
Li C.
Li P.
Li P.
Liben-Nowell D.
Lizorkin D.
McCallum A.
McPherson M.
Potamias M.
Small H.
Sun L.
Tretyakov K.
Trißl S.
Wang H.
Zhang S.
Zhang S.
Zhao P.
Zou L.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Exact Single-Source SimRank Computation on Large Graphs

Author: Du Xiaoyong
Wang Hanzhi
Wei Zhewei
Wen Ji-Rong
Yuan Ye
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2020
Field of study

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-

k

SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than

10^6

nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-

k

SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

arXiv.org e-Print Archive

Crossref

An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

Author
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large dynamic graphs. In this paper, we propose a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search). In the preprocessing stage, TSF samples a set of one-way graphs to index raw random walks in a novel manner within O(N Rg) time and space, where N is the number of vertices and Rg is the number of one-way graphs. The one-way graph can be efficiently updated in accordance with the graph modification, thus TSF is well suited to dynamic graphs. During the query stage, TSF can search similar vertices fast by naturally pruning unqualified vertices based on the connectivity of one-way graphs. Furthermore, with additional Rq samples, TSF can estimate the SimRank score with probabil- (1−c) 2 if the error of approximation is bounded by 1 − ǫ. Finally, to guarantee the scalability of TSF, the one-way graphs can also be compactly stored on the disk when the memory is limited. Extensive experiments have demonstrated that TSF can handle dynamic billion-edge graphs with high performance

CiteSeerX

Efficient partial-pairs simrank search on large networks

Author: Sun L.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Sequence queries on temporal graphs

Author: Zhu Haohan
Publication venue
Publication date: 21/06/2016
Field of study

Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC

Boston University Institutional Repository (OpenBU)

Efficient SimRank-based Similarity Join Over Large Graphs

Author: Dongyan Zhao
Lei Chen
Lei Zou
Weiguo Zheng
Yansong Feng
Publication venue
Publication date: 01/01/2013
Field of study

Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, etc. In this paper, we adopt “SimRank ” to evaluate the similarity of two vertices in a large graph because of its generality. Note that “Sim-Rank ” is purely structure dependent and it does not rely on the domain knowledge. Specifically, we define a SimRank-based join (SRJ) query to find all the vertex pairs satisfying the threshold in a data graph G. In order to reduce the search space, we propose an estimated shortest-path distance based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover, to efficiently compute the SimRank score of a single vertex pair. Given a graph G, we only materialize the SimRank scores of a small proportion of vertex pairs (called h-go covers), based on which, the SimRank score of any vertex pair can be computed easily. In order to handle large graphs, we extend our technique to the partition-based framework. Thorough theoretical analysis and extensive experiments over both real and synthetic datasets confirm the efficiency and effectiveness of our solution

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository