1,291 research outputs found
Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme
{\em Personalized PageRank (PPR)} stands as a fundamental proximity measure
in graph mining. Since computing an exact SSPPR query answer is prohibitive,
most existing solutions turn to approximate queries with guarantees. The
state-of-the-art solutions for approximate SSPPR queries are index-based and
mainly focus on static graphs, while real-world graphs are usually dynamically
changing. However, existing index-update schemes can not achieve a sub-linear
update time. Motivated by this, we present an efficient indexing scheme to
maintain indexed random walks in expected time after each graph update.
To reduce the space consumption, we further propose a new sampling scheme to
remove the auxiliary data structure for vertices while still supporting
index update cost on evolving graphs. Extensive experiments show that our
update scheme achieves orders of magnitude speed-up on update performance over
existing index-based dynamic schemes without sacrificing the query efficiency
The Minimum Wiener Connector
The Wiener index of a graph is the sum of all pairwise shortest-path
distances between its vertices. In this paper we study the novel problem of
finding a minimum Wiener connector: given a connected graph and a set
of query vertices, find a subgraph of that connects all
query vertices and has minimum Wiener index.
We show that The Minimum Wiener Connector admits a polynomial-time (albeit
impractical) exact algorithm for the special case where the number of query
vertices is bounded. We show that in general the problem is NP-hard, and has no
PTAS unless . Our main contribution is a
constant-factor approximation algorithm running in time
.
A thorough experimentation on a large variety of real-world graphs confirms
that our method returns smaller and denser solutions than other methods, and
does so by adding to the query set a small number of important vertices
(i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International
Conference on Management of Dat
Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation
A fundamental problem arising in many applications in Web science and social
network analysis is, given an arbitrary approximation factor , to output a
set of nodes that with high probability contains all nodes of PageRank at
least , and no node of PageRank smaller than . We call this
problem {\sc SignificantPageRanks}. We develop a nearly optimal, local
algorithm for the problem with runtime complexity on
networks with nodes. We show that any algorithm for solving this problem
must have runtime of , rendering our algorithm optimal up
to logarithmic factors.
Our algorithm comes with two main technical contributions. The first is a
multi-scale sampling scheme for a basic matrix problem that could be of
interest on its own. In the abstract matrix problem it is assumed that one can
access an unknown {\em right-stochastic matrix} by querying its rows, where the
cost of a query and the accuracy of the answers depend on a precision parameter
. At a cost propositional to , the query will return a
list of entries and their indices that provide an
-precision approximation of the row. Our task is to find a set that
contains all columns whose sum is at least , and omits any column whose
sum is less than . Our multi-scale sampling scheme solves this
problem with cost , while traditional sampling algorithms
would take time .
Our second main technical contribution is a new local algorithm for
approximating personalized PageRank, which is more robust than the earlier ones
developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly
for networks with large in-degrees or out-degrees. Together with our multiscale
sampling scheme we are able to optimally solve the {\sc SignificantPageRanks}
problem.Comment: Accepted to Internet Mathematics journal for publication. An extended
abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time
Algorithm for PageRank Computations
ํฐ ๊ทธ๋ํ ์์์์ ๊ฐ์ธํ๋ ํ์ด์ง ๋ญํฌ์ ๋ํ ๋น ๋ฅธ ๊ณ์ฐ ๊ธฐ๋ฒ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2020. 8. ์ด์๊ตฌ.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.๊ทธ๋ํ
๋ด์์ ๊ฐ์ธํ๋ ํ์ด์ง๋ญํฌ (P ersonalized P age R ank, PPR ๋ฅผ ๊ณ์ฐํ๋ ๊ฒ์ ๊ฒ์ , ์ถ์ฒ , ์ง์๋ฐ๊ฒฌ ๋ฑ ์ฌ๋ฌ ๋ถ์ผ์์ ๊ด๋ฒ์ํ๊ฒ ํ์ฉ๋๋ ์ค์ํ ์์
์ด๋ค . ๊ฐ์ธํ๋ ํ์ด์ง๋ญํฌ๋ฅผ ๊ณ์ฐํ๋ ๊ฒ์ ๊ณ ๋น์ฉ์ ๊ณผ์ ์ด ํ์ํ๋ฏ๋ก , ๊ฐ์ธํ๋ ํ์ด์ง๋ญํฌ๋ฅผ ๊ณ์ฐํ๋ ํจ์จ์ ์ด๊ณ ํ์ ์ ์ธ ๋ฐฉ๋ฒ๋ค์ด ๋ค์ ๊ฐ๋ฐ๋์ด์๋ค . ๊ทธ๋ฌ๋ ์๋ฐฑ๋ง ์ด์์ ๋
ธ๋๋ฅผ ๊ฐ์ง ๋์ฉ๋ ๊ทธ๋ํ์ ๋ํ ํจ์จ์ ์ธ ๊ณ์ฐ์ ์ฌ์ ํ ํด๊ฒฐ๋์ง ์์ ๋ฌธ์ ์ด๋ค . ๊ทธ์ ๋ํ์ฌ , ๊ธฐ์กด ์ ์๋ ์๊ณ ๋ฆฌ๋ฌ๋ค์ ๊ทธ๋ํ ๊ฐฑ์ ์ ํจ์จ์ ์ผ๋ก ๋ค๋ฃจ์ง ๋ชปํ์ฌ ๋์ ์ผ๋ก ๋ณํํ๋ ๊ทธ๋ํ๋ฅผ ๋ค๋ฃจ๋ ๋ฐ์ ํ๊ณ์ ์ด ํฌ๋ค . ๋ณธ ์ฐ๊ตฌ์์๋ ๋์ ์ ๋ฐ๋๋ฅผ ๋ณด์ฅํ๊ณ ์ ๋ฐ๋๋ฅผ ํต์ ๊ฐ๋ฅํ , ๋น ๋ฅด๊ฒ ์๋ ดํ๋ ๊ฐ์ธํ๋ ํ์ด์ง๋ญํฌ ๊ณ์ฐ ์๊ณ ๋ฆฌ๋ฌ์ ์ ์ํ๋ค . ์ ํต์ ์ธ ๊ฑฐ๋ญ์ ๊ณฑ๋ฒ (Power ์ ์ถ์ฐจ๊ฐ์์ํ๋ฒ (Successive Over Relaxation) ๊ณผ ์ด๊ธฐ ์ถ์ธก ๊ฐ ๋ณด์ ๋ฒ (Initial Guess ์ ํ์ฉํ ๋ฒกํฐ ์ฌ์ฌ์ฉ ์ ๋ต์ ์ ์ฉํ์ฌ ์๋ ด ์๋๋ฅผ ๊ฐ์ ํ์๋ค . ์ ์๋ ๋ฐฉ๋ฒ์ ๊ธฐ์กด ๊ฑฐ๋ญ์ ๊ณฑ๋ฒ์ ์ฅ์ ์ธ ๋จ์์ฑ๊ณผ ์๋ฐ์ฑ์ ์ ์ง ํ๋ฉด์ ๋ ์๋ ด์จ๊ณผ ๊ณ์ฐ์๋๋ฅผ ํฌ๊ฒ ๊ฐ์ ํ๋ค . ๋ํ ๊ฐ์ธํ๋ ํ์ด์ง๋ญํฌ ๋ฒกํฐ์ ๊ฐฑ์ ์ ์ํ์ฌ ์ด์ ์ ๊ณ์ฐ ๋์ด ์ ์ฅ๋ ๋ฒกํฐ๋ฅผ ์ฌ์ฌ์ฉํ ์ฌ , ๊ฐฑ์ ์ ๋๋ ์๊ฐ์ด ํฌ๊ฒ ๋จ์ถ๋๋ค . ๋ณธ ๋ฐฉ๋ฒ์ ์ฃผ์ด์ง ์ค์ฐจ ํ๊ณ์ ๋๋ฌํ๋ ์ฆ์ ๊ฒฐ๊ณผ๊ฐ์ ์ฐ์ถํ๋ฏ๋ก ์ ํ๋์ ๊ณ์ฐ์๊ฐ์ ์ ์ฐํ๊ฒ ์กฐ์ ํ ์ ์์ผ๋ฉฐ ์ด๋ ํ๋ณธ ๊ธฐ๋ฐ ์ถ์ ๋ฐฉ๋ฒ์ด๋ ์ ํํ ๊ฐ์ ์ฐ์ถํ๋ ์ญํ๋ ฌ ๊ธฐ๋ฐ ๋ฐฉ๋ฒ ์ด ๊ฐ์ง์ง ๋ชปํ ํน์ฑ์ด๋ค . ์คํ ๊ฒฐ๊ณผ , ๋ณธ ๋ฐฉ๋ฒ์ ๊ฑฐ๋ญ์ ๊ณฑ๋ฒ์ ๋นํ์ฌ 20 ๋ฐฐ ์ด์ ๋น ๋ฅด๊ฒ ์๋ ดํ๋ค๋ ๊ฒ์ด ํ์ธ๋์์ผ๋ฉฐ , ๊ธฐ ์ ์๋ ์ต๊ณ ์ฑ๋ฅ ์ ์๊ณ ๋ฆฌ ๋ฌ ๋ณด๋ค ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ด๋ ๊ฒ ๋ํ ํ์ธ๋์๋ค1 Introduction 1
2 Preliminaries: Personalized PageRank 4
2.1 Random Walk, PageRank, and Personalized PageRank. 5
2.1.1 Basics on Random Walk 5
2.1.2 PageRank. 6
2.1.3 Personalized PageRank 8
2.2 Characteristics of Personalized PageRank. 9
2.3 Applications of Personalized PageRank. 12
2.4 Previous Work on Personalized PageRank Computation. 17
2.4.1 Basic Algorithms 17
2.4.2 Enhanced Power Iteration 18
2.4.3 Bookmark Coloring Algorithm. 20
2.4.4 Dynamic Programming 21
2.4.5 Monte-Carlo Sampling. 22
2.4.6 Enhanced Direct Solving 24
2.5 Summary 26
3 Personalized PageRank Computation with Initial Guess Revision 30
3.1 Initial Guess Revision and Relaxation 30
3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34
3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36
4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42
4.1 FPPR with IGR. 42
4.2 Optimization. 49
4.3 Experiments. 52
5 Personalized PageRank Query Processing with Initial Guess Revision 56
5.1 PPR Query Processing with IGR 56
5.2 Optimization. 64
5.3 Experiments. 67
6 Conclusion 74
Bibliography 77
Appendix 88
Abstract (In Korean) 90Docto
Accelerating Minimal Perfect Hash Function Construction Using GPU Parallelization
Eine Minimale Perfekte Hashfunktion (MPHF) bildet eine Menge von N Schlรผsseln kollisionsfrei auf die Menge [N ] := {0, .., N โ 1} ab. Diese Thesis leistet einen signifikanten Beitrag fรผr den folgenden generischen MPHF Konstruktionsalgorithmus. Im ersten Schritt werden die Schlรผssel in Buckets unterschiedlicher erwarteter Grรถรe verteilt. Wir zeigen, dass die Wahl der erwarteten Bucketgrรถรe ein Optimierungsproblem darstellt welches durch die Euler-Lagrange Gleichung gelรถst werden kann. Dies resultiert in eine signifikante Verbesserung im Vergleich zum derzeitigen Stand der Forschung. Im zweiten Schritt werden die Buckets primรคr in nicht aufsteigender Grรถรe geordnet. Wir zeigen, dass der Platzbedarf verbessert wird wenn Buckets gleicher Grรถรe sekundรคr in aufsteigender Erwartungsgrรถรe angeordnet werden. Die Buckets werden dann im dritten Schritt in dieser Reihenfolge verarbeitet indem eine Hashfunktion gefunden wird welche alle Schlรผssel des Buckets kollisionsfrei auf [N ] abbildet. Abschlieรend wird fรผr jeden Bucket ein Identifikator der Hashfunktion komprimiert gespeichert. Wir prรคsentieren eine neue Kompressionstechnik, welche die Identifikatoren in unterschiedliche Enkodierer anordnet, sodass alle Identifikatoren innerhalb eines Enkodierers der gleichen statistischen Verteilung folgen. Dies verbessert die Komprimierbarkeit der Identifikatoren. Wir nutzen die parallele Leistungsfรคhigkeit von GPUs um die Konstruktion von MPHFs weiter zu beschleunigen. Unsere GPU Implementierung konstruiert eine MPHF mit 1,73 Bits pro Schlรผssel in nur 36 ns pro Schlรผssel mit einer CPU Abfragezeit von 44 ns. Eine solch geringe Abfragezeit bei gleichzeitig niedrigem Platzbedarf ist nach heutigem Stand, wie z.B. mit PTHash, nicht erreichbar. Eine MPHF, die einen hรถheren Platzbedarf von 1,88 Bits pro Schlรผssel aufweist, wird mit unserer Implementierung 9926 mal schneller konstruiert als durch PTHash. Die meisten unserer Beitrรคge sind รผber unsere spezifische Implementierung hinaus anwendbar und kรถnnen selbst modernste Techniken weiter verbessern
Random Walk on Multiple Networks
Random Walk is a basic algorithm to explore the structure of networks, which
can be used in many tasks, such as local community detection and network
embedding. Existing random walk methods are based on single networks that
contain limited information. In contrast, real data often contain entities with
different types or/and from different sources, which are comprehensive and can
be better modeled by multiple networks. To take advantage of rich information
in multiple networks and make better inferences on entities, in this study, we
propose random walk on multiple networks, RWM. RWM is flexible and supports
both multiplex networks and general multiple networks, which may form
many-to-many node mappings between networks. RWM sends a random walker on each
network to obtain the local proximity (i.e., node visiting probabilities)
w.r.t. the starting nodes. Walkers with similar visiting probabilities
reinforce each other. We theoretically analyze the convergence properties of
RWM. Two approximation methods with theoretical performance guarantees are
proposed for efficient computation. We apply RWM in link prediction, network
embedding, and local community detection. Comprehensive experiments conducted
on both synthetic and real-world datasets demonstrate the effectiveness and
efficiency of RWM.Comment: Accepted to IEEE TKD
- โฆ