251 research outputs found
Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme
{\em Personalized PageRank (PPR)} stands as a fundamental proximity measure
in graph mining. Since computing an exact SSPPR query answer is prohibitive,
most existing solutions turn to approximate queries with guarantees. The
state-of-the-art solutions for approximate SSPPR queries are index-based and
mainly focus on static graphs, while real-world graphs are usually dynamically
changing. However, existing index-update schemes can not achieve a sub-linear
update time. Motivated by this, we present an efficient indexing scheme to
maintain indexed random walks in expected time after each graph update.
To reduce the space consumption, we further propose a new sampling scheme to
remove the auxiliary data structure for vertices while still supporting
index update cost on evolving graphs. Extensive experiments show that our
update scheme achieves orders of magnitude speed-up on update performance over
existing index-based dynamic schemes without sacrificing the query efficiency
Exact Single-Source SimRank Computation on Large Graphs
SimRank is a popular measurement for evaluating the node-to-node similarities
based on the graph topology. In recent years, single-source and top- SimRank
queries have received increasing attention due to their applications in web
mining, social network analysis, and spam detection. However, a fundamental
obstacle in studying SimRank has been the lack of ground truths. The only exact
algorithm, Power Method, is computationally infeasible on graphs with more than
nodes. Consequently, no existing work has evaluated the actual
trade-offs between query time and accuracy on large real-world graphs. In this
paper, we present ExactSim, the first algorithm that computes the exact
single-source and top- SimRank results on large graphs. With high
probability, this algorithm produces ground truths with a rigorous theoretical
guarantee. We conduct extensive experiments on real-world datasets to
demonstrate the efficiency of ExactSim. The results show that ExactSim provides
the ground truth for any single-source SimRank query with a precision up to 7
decimal places within a reasonable query time.Comment: ACM SIGMOD 202
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
{\it SimRank} is a classic measure of the similarities of nodes in a graph.
Given a node in graph , a {\em single-source SimRank query}
returns the SimRank similarities between node and each node . This type of queries has numerous applications in web search and social
networks analysis, such as link prediction, web mining, and spam detection.
Existing methods for single-source SimRank queries, however, incur query cost
at least linear to the number of nodes , which renders them inapplicable for
real-time and interactive analysis.
{ This paper proposes \prsim, an algorithm that exploits the structure of
graphs to efficiently answer single-source SimRank queries. \prsim uses an
index of size , where is the number of edges in the graph, and
guarantees a query time that depends on the {\em reverse PageRank} distribution
of the input graph. In particular, we prove that \prsim runs in sub-linear time
if the degree distribution of the input graph follows the power-law
distribution, a property possessed by many real-world graphs. Based on the
theoretical analysis, we show that the empirical query time of all existing
SimRank algorithms also depends on the reverse PageRank distribution of the
graph.} Finally, we present the first experimental study that evaluates the
absolute errors of various SimRank algorithms on large graphs, and we show that
\prsim outperforms the state of the art in terms of query time, accuracy, index
size, and scalability.Comment: ACM SIGMOD 201
An Incrementally Expanding Approach for Updating PageRank on Dynamic Graphs
PageRank is a popular centrality metric that assigns importance to the
vertices of a graph based on its neighbors and their score. Efficient parallel
algorithms for updating PageRank on dynamic graphs is crucial for various
applications, especially as dataset sizes have reached substantial scales. This
technical report presents our Dynamic Frontier approach. Given a batch update
of edge deletion and insertions, it progressively identifies affected vertices
that are likely to change their ranks with minimal overhead. On a server
equipped with a 64-core AMD EPYC-7742 processor, our Dynamic Frontier PageRank
outperforms Static, Naive-dynamic, and Dynamic Traversal PageRank by 7.8x,
2.9x, and 3.9x respectively - on uniformly random batch updates of size 10^-7
|E| to 10^-3 |E|. In addition, our approach improves performance at an average
rate of 1.8x for every doubling of threads.Comment: 11 pages, 14 figures, 1 tabl
ν° κ·Έλν μμμμ κ°μΈνλ νμ΄μ§ λν¬μ λν λΉ λ₯Έ κ³μ° κΈ°λ²
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2020. 8. μ΄μꡬ.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.κ·Έλν
λ΄μμ κ°μΈνλ νμ΄μ§λν¬ (P ersonalized P age R ank, PPR λ₯Ό κ³μ°νλ κ²μ κ²μ , μΆμ² , μ§μλ°κ²¬ λ± μ¬λ¬ λΆμΌμμ κ΄λ²μνκ² νμ©λλ μ€μν μμ
μ΄λ€ . κ°μΈνλ νμ΄μ§λν¬λ₯Ό κ³μ°νλ κ²μ κ³ λΉμ©μ κ³Όμ μ΄ νμνλ―λ‘ , κ°μΈνλ νμ΄μ§λν¬λ₯Ό κ³μ°νλ ν¨μ¨μ μ΄κ³ νμ μ μΈ λ°©λ²λ€μ΄ λ€μ κ°λ°λμ΄μλ€ . κ·Έλ¬λ μλ°±λ§ μ΄μμ λ
Έλλ₯Ό κ°μ§ λμ©λ κ·Έλνμ λν ν¨μ¨μ μΈ κ³μ°μ μ¬μ ν ν΄κ²°λμ§ μμ λ¬Έμ μ΄λ€ . κ·Έμ λνμ¬ , κΈ°μ‘΄ μ μλ μκ³ λ¦¬λ¬λ€μ κ·Έλν κ°±μ μ ν¨μ¨μ μΌλ‘ λ€λ£¨μ§ λͺ»νμ¬ λμ μΌλ‘ λ³ννλ κ·Έλνλ₯Ό λ€λ£¨λ λ°μ νκ³μ μ΄ ν¬λ€ . λ³Έ μ°κ΅¬μμλ λμ μ λ°λλ₯Ό 보μ₯νκ³ μ λ°λλ₯Ό ν΅μ κ°λ₯ν , λΉ λ₯΄κ² μλ ΄νλ κ°μΈνλ νμ΄μ§λν¬ κ³μ° μκ³ λ¦¬λ¬μ μ μνλ€ . μ ν΅μ μΈ κ±°λμ κ³±λ² (Power μ μΆμ°¨κ°μμνλ² (Successive Over Relaxation) κ³Ό μ΄κΈ° μΆμΈ‘ κ° λ³΄μ λ² (Initial Guess μ νμ©ν λ²‘ν° μ¬μ¬μ© μ λ΅μ μ μ©νμ¬ μλ ΄ μλλ₯Ό κ°μ νμλ€ . μ μλ λ°©λ²μ κΈ°μ‘΄ κ±°λμ κ³±λ²μ μ₯μ μΈ λ¨μμ±κ³Ό μλ°μ±μ μ μ§ νλ©΄μ λ μλ ΄μ¨κ³Ό κ³μ°μλλ₯Ό ν¬κ² κ°μ νλ€ . λν κ°μΈνλ νμ΄μ§λν¬ λ²‘ν°μ κ°±μ μ μνμ¬ μ΄μ μ κ³μ° λμ΄ μ μ₯λ 벑ν°λ₯Ό μ¬μ¬μ©ν μ¬ , κ°±μ μ λλ μκ°μ΄ ν¬κ² λ¨μΆλλ€ . λ³Έ λ°©λ²μ μ£Όμ΄μ§ μ€μ°¨ νκ³μ λλ¬νλ μ¦μ κ²°κ³Όκ°μ μ°μΆνλ―λ‘ μ νλμ κ³μ°μκ°μ μ μ°νκ² μ‘°μ ν μ μμΌλ©° μ΄λ νλ³Έ κΈ°λ° μΆμ λ°©λ²μ΄λ μ νν κ°μ μ°μΆνλ μνλ ¬ κΈ°λ° λ°©λ² μ΄ κ°μ§μ§ λͺ»ν νΉμ±μ΄λ€ . μ€ν κ²°κ³Ό , λ³Έ λ°©λ²μ κ±°λμ κ³±λ²μ λΉνμ¬ 20 λ°° μ΄μ λΉ λ₯΄κ² μλ ΄νλ€λ κ²μ΄ νμΈλμμΌλ©° , κΈ° μ μλ μ΅κ³ μ±λ₯ μ μκ³ λ¦¬ λ¬ λ³΄λ€ μ°μν μ±λ₯μ 보μ΄λ κ² λν νμΈλμλ€1 Introduction 1
2 Preliminaries: Personalized PageRank 4
2.1 Random Walk, PageRank, and Personalized PageRank. 5
2.1.1 Basics on Random Walk 5
2.1.2 PageRank. 6
2.1.3 Personalized PageRank 8
2.2 Characteristics of Personalized PageRank. 9
2.3 Applications of Personalized PageRank. 12
2.4 Previous Work on Personalized PageRank Computation. 17
2.4.1 Basic Algorithms 17
2.4.2 Enhanced Power Iteration 18
2.4.3 Bookmark Coloring Algorithm. 20
2.4.4 Dynamic Programming 21
2.4.5 Monte-Carlo Sampling. 22
2.4.6 Enhanced Direct Solving 24
2.5 Summary 26
3 Personalized PageRank Computation with Initial Guess Revision 30
3.1 Initial Guess Revision and Relaxation 30
3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34
3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36
4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42
4.1 FPPR with IGR. 42
4.2 Optimization. 49
4.3 Experiments. 52
5 Personalized PageRank Query Processing with Initial Guess Revision 56
5.1 PPR Query Processing with IGR 56
5.2 Optimization. 64
5.3 Experiments. 67
6 Conclusion 74
Bibliography 77
Appendix 88
Abstract (In Korean) 90Docto
Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees
Given a time-evolving graph, how can we track similarity between nodes in a
fast and accurate way, with theoretical guarantees on the convergence and the
error? Random Walk with Restart (RWR) is a popular measure to estimate the
similarity between nodes and has been exploited in numerous applications. Many
real-world graphs are dynamic with frequent insertion/deletion of edges; thus,
tracking RWR scores on dynamic graphs in an efficient way has aroused much
interest among data mining researchers. Recently, dynamic RWR models based on
the propagation of scores across a given graph have been proposed, and have
succeeded in outperforming previous other approaches to compute RWR
dynamically. However, those models fail to guarantee exactness and convergence
time for updating RWR in a generalized form. In this paper, we propose OSP, a
fast and accurate algorithm for computing dynamic RWR with insertion/deletion
of nodes/edges in a directed/undirected graph. When the graph is updated, OSP
first calculates offset scores around the modified edges, propagates the offset
scores across the updated graph, and then merges them with the current RWR
scores to get updated RWR scores. We prove the exactness of OSP and introduce
OSP-T, a version of OSP which regulates a trade-off between accuracy and
computation time by using error tolerance {\epsilon}. Given restart probability
c, OSP-T guarantees to return RWR scores with O ({\epsilon} /c ) error in O
(log ({\epsilon}/2)/log(1-c)) iterations. Through extensive experiments, we
show that OSP tracks RWR exactly up to 4605x faster than existing static RWR
method on dynamic graphs, and OSP-T requires up to 15x less time with 730x
lower L1 norm error and 3.3x lower rank error than other state-of-the-art
dynamic RWR methods.Comment: 10 pages, 8 figure
- β¦