326 research outputs found
Efficient Estimation of Heat Kernel PageRank for Local Clustering
Given an undirected graph G and a seed node s, the local clustering problem
aims to identify a high-quality cluster containing s in time roughly
proportional to the size of the cluster, regardless of the size of G. This
problem finds numerous applications on large-scale graphs. Recently, heat
kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs,
is applied to this problem and found to be more efficient compared with prior
methods. However, existing solutions for computing HKPR either are
prohibitively expensive or provide unsatisfactory error approximation on HKPR
values, rendering them impractical especially on billion-edge graphs.
In this paper, we present TEA and TEA+, two novel local graph clustering
algorithms based on HKPR, to address the aforementioned limitations.
Specifically, these algorithms provide non-trivial theoretical guarantees in
relative error of HKPR values and the time complexity. The basic idea is to
utilize deterministic graph traversal to produce a rough estimation of exact
HKPR vector, and then exploit Monte-Carlo random walks to refine the results in
an optimized and non-trivial way. In particular, TEA+ offers practical
efficiency and effectiveness due to non-trivial optimizations. Extensive
experiments on real-world datasets demonstrate that TEA+ outperforms the
state-of-the-art algorithm by more than four times on most benchmark datasets
in terms of computational time when achieving the same clustering quality, and
in particular, is an order of magnitude faster on large graphs including the
widely studied Twitter and Friendster datasets.Comment: The technical report for the full research paper accepted in the
SIGMOD 201
Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees
Given a time-evolving graph, how can we track similarity between nodes in a
fast and accurate way, with theoretical guarantees on the convergence and the
error? Random Walk with Restart (RWR) is a popular measure to estimate the
similarity between nodes and has been exploited in numerous applications. Many
real-world graphs are dynamic with frequent insertion/deletion of edges; thus,
tracking RWR scores on dynamic graphs in an efficient way has aroused much
interest among data mining researchers. Recently, dynamic RWR models based on
the propagation of scores across a given graph have been proposed, and have
succeeded in outperforming previous other approaches to compute RWR
dynamically. However, those models fail to guarantee exactness and convergence
time for updating RWR in a generalized form. In this paper, we propose OSP, a
fast and accurate algorithm for computing dynamic RWR with insertion/deletion
of nodes/edges in a directed/undirected graph. When the graph is updated, OSP
first calculates offset scores around the modified edges, propagates the offset
scores across the updated graph, and then merges them with the current RWR
scores to get updated RWR scores. We prove the exactness of OSP and introduce
OSP-T, a version of OSP which regulates a trade-off between accuracy and
computation time by using error tolerance {\epsilon}. Given restart probability
c, OSP-T guarantees to return RWR scores with O ({\epsilon} /c ) error in O
(log ({\epsilon}/2)/log(1-c)) iterations. Through extensive experiments, we
show that OSP tracks RWR exactly up to 4605x faster than existing static RWR
method on dynamic graphs, and OSP-T requires up to 15x less time with 730x
lower L1 norm error and 3.3x lower rank error than other state-of-the-art
dynamic RWR methods.Comment: 10 pages, 8 figure
A Hybrid Web Recommendation System based on the Improved Association Rule Mining Algorithm
As the growing interest of web recommendation systems those are applied to
deliver customized data for their users, we started working on this system.
Generally the recommendation systems are divided into two major categories such
as collaborative recommendation system and content based recommendation system.
In case of collaborative recommen-dation systems, these try to seek out users
who share same tastes that of given user as well as recommends the websites
according to the liking given user. Whereas the content based recommendation
systems tries to recommend web sites similar to those web sites the user has
liked. In the recent research we found that the efficient technique based on
asso-ciation rule mining algorithm is proposed in order to solve the problem of
web page recommendation. Major problem of the same is that the web pages are
given equal importance. Here the importance of pages changes according to the
fre-quency of visiting the web page as well as amount of time user spends on
that page. Also recommendation of newly added web pages or the pages those are
not yet visited by users are not included in the recommendation set. To
over-come this problem, we have used the web usage log in the adaptive
association rule based web mining where the asso-ciation rules were applied to
personalization. This algorithm was purely based on the Apriori data mining
algorithm in order to generate the association rules. However this method also
suffers from some unavoidable drawbacks. In this paper we are presenting and
investigating the new approach based on weighted Association Rule Mining
Algorithm and text mining. This is improved algorithm which adds semantic
knowledge to the results, has more efficiency and hence gives better quality
and performances as compared to existing approaches.Comment: 9 pages, 7 figures, 2 table
ν° κ·Έλν μμμμ κ°μΈνλ νμ΄μ§ λν¬μ λν λΉ λ₯Έ κ³μ° κΈ°λ²
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2020. 8. μ΄μꡬ.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.κ·Έλν
λ΄μμ κ°μΈνλ νμ΄μ§λν¬ (P ersonalized P age R ank, PPR λ₯Ό κ³μ°νλ κ²μ κ²μ , μΆμ² , μ§μλ°κ²¬ λ± μ¬λ¬ λΆμΌμμ κ΄λ²μνκ² νμ©λλ μ€μν μμ
μ΄λ€ . κ°μΈνλ νμ΄μ§λν¬λ₯Ό κ³μ°νλ κ²μ κ³ λΉμ©μ κ³Όμ μ΄ νμνλ―λ‘ , κ°μΈνλ νμ΄μ§λν¬λ₯Ό κ³μ°νλ ν¨μ¨μ μ΄κ³ νμ μ μΈ λ°©λ²λ€μ΄ λ€μ κ°λ°λμ΄μλ€ . κ·Έλ¬λ μλ°±λ§ μ΄μμ λ
Έλλ₯Ό κ°μ§ λμ©λ κ·Έλνμ λν ν¨μ¨μ μΈ κ³μ°μ μ¬μ ν ν΄κ²°λμ§ μμ λ¬Έμ μ΄λ€ . κ·Έμ λνμ¬ , κΈ°μ‘΄ μ μλ μκ³ λ¦¬λ¬λ€μ κ·Έλν κ°±μ μ ν¨μ¨μ μΌλ‘ λ€λ£¨μ§ λͺ»νμ¬ λμ μΌλ‘ λ³ννλ κ·Έλνλ₯Ό λ€λ£¨λ λ°μ νκ³μ μ΄ ν¬λ€ . λ³Έ μ°κ΅¬μμλ λμ μ λ°λλ₯Ό 보μ₯νκ³ μ λ°λλ₯Ό ν΅μ κ°λ₯ν , λΉ λ₯΄κ² μλ ΄νλ κ°μΈνλ νμ΄μ§λν¬ κ³μ° μκ³ λ¦¬λ¬μ μ μνλ€ . μ ν΅μ μΈ κ±°λμ κ³±λ² (Power μ μΆμ°¨κ°μμνλ² (Successive Over Relaxation) κ³Ό μ΄κΈ° μΆμΈ‘ κ° λ³΄μ λ² (Initial Guess μ νμ©ν λ²‘ν° μ¬μ¬μ© μ λ΅μ μ μ©νμ¬ μλ ΄ μλλ₯Ό κ°μ νμλ€ . μ μλ λ°©λ²μ κΈ°μ‘΄ κ±°λμ κ³±λ²μ μ₯μ μΈ λ¨μμ±κ³Ό μλ°μ±μ μ μ§ νλ©΄μ λ μλ ΄μ¨κ³Ό κ³μ°μλλ₯Ό ν¬κ² κ°μ νλ€ . λν κ°μΈνλ νμ΄μ§λν¬ λ²‘ν°μ κ°±μ μ μνμ¬ μ΄μ μ κ³μ° λμ΄ μ μ₯λ 벑ν°λ₯Ό μ¬μ¬μ©ν μ¬ , κ°±μ μ λλ μκ°μ΄ ν¬κ² λ¨μΆλλ€ . λ³Έ λ°©λ²μ μ£Όμ΄μ§ μ€μ°¨ νκ³μ λλ¬νλ μ¦μ κ²°κ³Όκ°μ μ°μΆνλ―λ‘ μ νλμ κ³μ°μκ°μ μ μ°νκ² μ‘°μ ν μ μμΌλ©° μ΄λ νλ³Έ κΈ°λ° μΆμ λ°©λ²μ΄λ μ νν κ°μ μ°μΆνλ μνλ ¬ κΈ°λ° λ°©λ² μ΄ κ°μ§μ§ λͺ»ν νΉμ±μ΄λ€ . μ€ν κ²°κ³Ό , λ³Έ λ°©λ²μ κ±°λμ κ³±λ²μ λΉνμ¬ 20 λ°° μ΄μ λΉ λ₯΄κ² μλ ΄νλ€λ κ²μ΄ νμΈλμμΌλ©° , κΈ° μ μλ μ΅κ³ μ±λ₯ μ μκ³ λ¦¬ λ¬ λ³΄λ€ μ°μν μ±λ₯μ 보μ΄λ κ² λν νμΈλμλ€1 Introduction 1
2 Preliminaries: Personalized PageRank 4
2.1 Random Walk, PageRank, and Personalized PageRank. 5
2.1.1 Basics on Random Walk 5
2.1.2 PageRank. 6
2.1.3 Personalized PageRank 8
2.2 Characteristics of Personalized PageRank. 9
2.3 Applications of Personalized PageRank. 12
2.4 Previous Work on Personalized PageRank Computation. 17
2.4.1 Basic Algorithms 17
2.4.2 Enhanced Power Iteration 18
2.4.3 Bookmark Coloring Algorithm. 20
2.4.4 Dynamic Programming 21
2.4.5 Monte-Carlo Sampling. 22
2.4.6 Enhanced Direct Solving 24
2.5 Summary 26
3 Personalized PageRank Computation with Initial Guess Revision 30
3.1 Initial Guess Revision and Relaxation 30
3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34
3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36
4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42
4.1 FPPR with IGR. 42
4.2 Optimization. 49
4.3 Experiments. 52
5 Personalized PageRank Query Processing with Initial Guess Revision 56
5.1 PPR Query Processing with IGR 56
5.2 Optimization. 64
5.3 Experiments. 67
6 Conclusion 74
Bibliography 77
Appendix 88
Abstract (In Korean) 90Docto
- β¦