1,291 research outputs found

    Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme

    Full text link
    {\em Personalized PageRank (PPR)} stands as a fundamental proximity measure in graph mining. Since computing an exact SSPPR query answer is prohibitive, most existing solutions turn to approximate queries with guarantees. The state-of-the-art solutions for approximate SSPPR queries are index-based and mainly focus on static graphs, while real-world graphs are usually dynamically changing. However, existing index-update schemes can not achieve a sub-linear update time. Motivated by this, we present an efficient indexing scheme to maintain indexed random walks in expected O(1)O(1) time after each graph update. To reduce the space consumption, we further propose a new sampling scheme to remove the auxiliary data structure for vertices while still supporting O(1)O(1) index update cost on evolving graphs. Extensive experiments show that our update scheme achieves orders of magnitude speed-up on update performance over existing index-based dynamic schemes without sacrificing the query efficiency

    The Minimum Wiener Connector

    Full text link
    The Wiener index of a graph is the sum of all pairwise shortest-path distances between its vertices. In this paper we study the novel problem of finding a minimum Wiener connector: given a connected graph G=(V,E)G=(V,E) and a set QโŠ†VQ\subseteq V of query vertices, find a subgraph of GG that connects all query vertices and has minimum Wiener index. We show that The Minimum Wiener Connector admits a polynomial-time (albeit impractical) exact algorithm for the special case where the number of query vertices is bounded. We show that in general the problem is NP-hard, and has no PTAS unless P=NP\mathbf{P} = \mathbf{NP}. Our main contribution is a constant-factor approximation algorithm running in time O~(โˆฃQโˆฃโˆฃEโˆฃ)\widetilde{O}(|Q||E|). A thorough experimentation on a large variety of real-world graphs confirms that our method returns smaller and denser solutions than other methods, and does so by adding to the query set QQ a small number of important vertices (i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Dat

    Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation

    Full text link
    A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor c>1c>1, to output a set SS of nodes that with high probability contains all nodes of PageRank at least ฮ”\Delta, and no node of PageRank smaller than ฮ”/c\Delta/c. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity O~(n/ฮ”)\tilde{O}(n/\Delta) on networks with nn nodes. We show that any algorithm for solving this problem must have runtime of ฮฉ(n/ฮ”){\Omega}(n/\Delta), rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {\em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ฯต\epsilon. At a cost propositional to 1/ฯต1/\epsilon, the query will return a list of O(1/ฯต)O(1/\epsilon) entries and their indices that provide an ฯต\epsilon-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least ฮ”\Delta, and omits any column whose sum is less than ฮ”/c\Delta/c. Our multi-scale sampling scheme solves this problem with cost O~(n/ฮ”)\tilde{O}(n/\Delta), while traditional sampling algorithms would take time ฮ˜((n/ฮ”)2)\Theta((n/\Delta)^2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {\sc SignificantPageRanks} problem.Comment: Accepted to Internet Mathematics journal for publication. An extended abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time Algorithm for PageRank Computations

    ํฐ ๊ทธ๋ž˜ํ”„ ์ƒ์—์„œ์˜ ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€ ๋žญํฌ์— ๋Œ€ํ•œ ๋น ๋ฅธ ๊ณ„์‚ฐ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์ด์ƒ๊ตฌ.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.๊ทธ๋ž˜ํ”„ ๋‚ด์—์„œ ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ (P ersonalized P age R ank, PPR ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ๊ฒ€์ƒ‰ , ์ถ”์ฒœ , ์ง€์‹๋ฐœ๊ฒฌ ๋“ฑ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ํ™œ์šฉ๋˜๋Š” ์ค‘์š”ํ•œ ์ž‘์—… ์ด๋‹ค . ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ๊ณ ๋น„์šฉ์˜ ๊ณผ์ •์ด ํ•„์š”ํ•˜๋ฏ€๋กœ , ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ํšจ์œจ์ ์ด๊ณ  ํ˜์‹ ์ ์ธ ๋ฐฉ๋ฒ•๋“ค์ด ๋‹ค์ˆ˜ ๊ฐœ๋ฐœ๋˜์–ด์™”๋‹ค . ๊ทธ๋Ÿฌ๋‚˜ ์ˆ˜๋ฐฑ๋งŒ ์ด์ƒ์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง„ ๋Œ€์šฉ๋Ÿ‰ ๊ทธ๋ž˜ํ”„์— ๋Œ€ํ•œ ํšจ์œจ์ ์ธ ๊ณ„์‚ฐ์€ ์—ฌ์ „ํžˆ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์€ ๋ฌธ์ œ์ด๋‹ค . ๊ทธ์— ๋”ํ•˜์—ฌ , ๊ธฐ์กด ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ๋“ฌ๋“ค์€ ๊ทธ๋ž˜ํ”„ ๊ฐฑ์‹ ์„ ํšจ์œจ์ ์œผ๋กœ ๋‹ค๋ฃจ์ง€ ๋ชปํ•˜์—ฌ ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ๊ทธ๋ž˜ํ”„๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ์— ํ•œ๊ณ„์ ์ด ํฌ๋‹ค . ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋†’์€ ์ •๋ฐ€๋„๋ฅผ ๋ณด์žฅํ•˜๊ณ  ์ •๋ฐ€๋„๋ฅผ ํ†ต์ œ ๊ฐ€๋Šฅํ•œ , ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋Š” ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ ๊ณ„์‚ฐ ์•Œ๊ณ ๋ฆฌ๋“ฌ์„ ์ œ์‹œํ•œ๋‹ค . ์ „ํ†ต์ ์ธ ๊ฑฐ๋“ญ์ œ๊ณฑ๋ฒ• (Power ์— ์ถ•์ฐจ๊ฐ€์†์™„ํ™”๋ฒ• (Successive Over Relaxation) ๊ณผ ์ดˆ๊ธฐ ์ถ”์ธก ๊ฐ’ ๋ณด์ •๋ฒ• (Initial Guess ์„ ํ™œ์šฉํ•œ ๋ฒกํ„ฐ ์žฌ์‚ฌ์šฉ ์ „๋žต์„ ์ ์šฉํ•˜์—ฌ ์ˆ˜๋ ด ์†๋„๋ฅผ ๊ฐœ์„ ํ•˜์˜€๋‹ค . ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด ๊ฑฐ๋“ญ์ œ๊ณฑ๋ฒ•์˜ ์žฅ์ ์ธ ๋‹จ์ˆœ์„ฑ๊ณผ ์—„๋ฐ€์„ฑ์„ ์œ ์ง€ ํ•˜๋ฉด์„œ ๋„ ์ˆ˜๋ ด์œจ๊ณผ ๊ณ„์‚ฐ์†๋„๋ฅผ ํฌ๊ฒŒ ๊ฐœ์„  ํ•œ๋‹ค . ๋˜ํ•œ ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ ๋ฒกํ„ฐ์˜ ๊ฐฑ์‹ ์„ ์œ„ํ•˜์—ฌ ์ด์ „์— ๊ณ„์‚ฐ ๋˜์–ด ์ €์žฅ๋œ ๋ฒกํ„ฐ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜ ์—ฌ , ๊ฐฑ์‹  ์— ๋“œ๋Š” ์‹œ๊ฐ„์ด ํฌ๊ฒŒ ๋‹จ์ถ•๋œ๋‹ค . ๋ณธ ๋ฐฉ๋ฒ•์€ ์ฃผ์–ด์ง„ ์˜ค์ฐจ ํ•œ๊ณ„์— ๋„๋‹ฌํ•˜๋Š” ์ฆ‰์‹œ ๊ฒฐ๊ณผ๊ฐ’์„ ์‚ฐ์ถœํ•˜๋ฏ€๋กœ ์ •ํ™•๋„์™€ ๊ณ„์‚ฐ์‹œ๊ฐ„์„ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” ํ‘œ๋ณธ ๊ธฐ๋ฐ˜ ์ถ”์ •๋ฐฉ๋ฒ•์ด๋‚˜ ์ •ํ™•ํ•œ ๊ฐ’์„ ์‚ฐ์ถœํ•˜๋Š” ์—ญํ–‰๋ ฌ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• ์ด ๊ฐ€์ง€์ง€ ๋ชปํ•œ ํŠน์„ฑ์ด๋‹ค . ์‹คํ—˜ ๊ฒฐ๊ณผ , ๋ณธ ๋ฐฉ๋ฒ•์€ ๊ฑฐ๋“ญ์ œ๊ณฑ๋ฒ•์— ๋น„ํ•˜์—ฌ 20 ๋ฐฐ ์ด์ƒ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํ™•์ธ๋˜์—ˆ์œผ๋ฉฐ , ๊ธฐ ์ œ์‹œ๋œ ์ตœ๊ณ  ์„ฑ๋Šฅ ์˜ ์•Œ๊ณ ๋ฆฌ ๋“ฌ ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ ๋˜ํ•œ ํ™•์ธ๋˜์—ˆ๋‹ค1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

    Accelerating Minimal Perfect Hash Function Construction Using GPU Parallelization

    Get PDF
    Eine Minimale Perfekte Hashfunktion (MPHF) bildet eine Menge von N Schlรผsseln kollisionsfrei auf die Menge [N ] := {0, .., N โˆ’ 1} ab. Diese Thesis leistet einen signifikanten Beitrag fรผr den folgenden generischen MPHF Konstruktionsalgorithmus. Im ersten Schritt werden die Schlรผssel in Buckets unterschiedlicher erwarteter GrรถรŸe verteilt. Wir zeigen, dass die Wahl der erwarteten BucketgrรถรŸe ein Optimierungsproblem darstellt welches durch die Euler-Lagrange Gleichung gelรถst werden kann. Dies resultiert in eine signifikante Verbesserung im Vergleich zum derzeitigen Stand der Forschung. Im zweiten Schritt werden die Buckets primรคr in nicht aufsteigender GrรถรŸe geordnet. Wir zeigen, dass der Platzbedarf verbessert wird wenn Buckets gleicher GrรถรŸe sekundรคr in aufsteigender ErwartungsgrรถรŸe angeordnet werden. Die Buckets werden dann im dritten Schritt in dieser Reihenfolge verarbeitet indem eine Hashfunktion gefunden wird welche alle Schlรผssel des Buckets kollisionsfrei auf [N ] abbildet. AbschlieรŸend wird fรผr jeden Bucket ein Identifikator der Hashfunktion komprimiert gespeichert. Wir prรคsentieren eine neue Kompressionstechnik, welche die Identifikatoren in unterschiedliche Enkodierer anordnet, sodass alle Identifikatoren innerhalb eines Enkodierers der gleichen statistischen Verteilung folgen. Dies verbessert die Komprimierbarkeit der Identifikatoren. Wir nutzen die parallele Leistungsfรคhigkeit von GPUs um die Konstruktion von MPHFs weiter zu beschleunigen. Unsere GPU Implementierung konstruiert eine MPHF mit 1,73 Bits pro Schlรผssel in nur 36 ns pro Schlรผssel mit einer CPU Abfragezeit von 44 ns. Eine solch geringe Abfragezeit bei gleichzeitig niedrigem Platzbedarf ist nach heutigem Stand, wie z.B. mit PTHash, nicht erreichbar. Eine MPHF, die einen hรถheren Platzbedarf von 1,88 Bits pro Schlรผssel aufweist, wird mit unserer Implementierung 9926 mal schneller konstruiert als durch PTHash. Die meisten unserer Beitrรคge sind รผber unsere spezifische Implementierung hinaus anwendbar und kรถnnen selbst modernste Techniken weiter verbessern

    Random Walk on Multiple Networks

    Full text link
    Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited information. In contrast, real data often contain entities with different types or/and from different sources, which are comprehensive and can be better modeled by multiple networks. To take advantage of rich information in multiple networks and make better inferences on entities, in this study, we propose random walk on multiple networks, RWM. RWM is flexible and supports both multiplex networks and general multiple networks, which may form many-to-many node mappings between networks. RWM sends a random walker on each network to obtain the local proximity (i.e., node visiting probabilities) w.r.t. the starting nodes. Walkers with similar visiting probabilities reinforce each other. We theoretically analyze the convergence properties of RWM. Two approximation methods with theoretical performance guarantees are proposed for efficient computation. We apply RWM in link prediction, network embedding, and local community detection. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of RWM.Comment: Accepted to IEEE TKD
    • โ€ฆ
    corecore