Search CORE

1,291 research outputs found

Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme

Author: Guo Qintian
Hou Guanhao
Wang Sibo
Wei Zhewei
Zhang Fangyuan
Publication venue
Publication date: 25/12/2022
Field of study

{\em Personalized PageRank (PPR)} stands as a fundamental proximity measure in graph mining. Since computing an exact SSPPR query answer is prohibitive, most existing solutions turn to approximate queries with guarantees. The state-of-the-art solutions for approximate SSPPR queries are index-based and mainly focus on static graphs, while real-world graphs are usually dynamically changing. However, existing index-update schemes can not achieve a sub-linear update time. Motivated by this, we present an efficient indexing scheme to maintain indexed random walks in expected

O(1)

time after each graph update. To reduce the space consumption, we further propose a new sampling scheme to remove the auxiliary data structure for vertices while still supporting

O(1)

index update cost on evolving graphs. Extensive experiments show that our update scheme achieves orders of magnitude speed-up on update performance over existing index-based dynamic schemes without sacrificing the query efficiency

arXiv.org e-Print Archive

The Minimum Wiener Connector

Author: Burt R.
Hwang D. S. R.
Jacobs K. M.
Stefanovic D.
Vogelstein B.
Zhang X.-D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/10/2016
Field of study

The Wiener index of a graph is the sum of all pairwise shortest-path distances between its vertices. In this paper we study the novel problem of finding a minimum Wiener connector: given a connected graph

G=(V,E)

and a set

Q\subseteq V

of query vertices, find a subgraph of

G

that connects all query vertices and has minimum Wiener index. We show that The Minimum Wiener Connector admits a polynomial-time (albeit impractical) exact algorithm for the special case where the number of query vertices is bounded. We show that in general the problem is NP-hard, and has no PTAS unless

\mathbf{P} = \mathbf{NP}

. Our main contribution is a constant-factor approximation algorithm running in time

\widetilde{O}(|Q||E|)

. A thorough experimentation on a large variety of real-world graphs confirms that our method returns smaller and denser solutions than other methods, and does so by adding to the query set

Q

a small number of important vertices (i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Dat

arXiv.org e-Print Archive

Crossref

Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation

Author: Borgs Christian
Brautbar Michael
Chayes Jennifer
Teng Shang-Hua
Publication venue
Publication date: 01/01/1202
Field of study

A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor

c>1

, to output a set

S

of nodes that with high probability contains all nodes of PageRank at least

\Delta

, and no node of PageRank smaller than

\Delta/c

. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity

\tilde{O}(n/\Delta)

on networks with

n

nodes. We show that any algorithm for solving this problem must have runtime of

{\Omega}(n/\Delta)

, rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {\em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter

\epsilon

. At a cost propositional to

1/\epsilon

, the query will return a list of

O(1/\epsilon)

entries and their indices that provide an

\epsilon

-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least

\Delta

, and omits any column whose sum is less than

\Delta/c

. Our multi-scale sampling scheme solves this problem with cost

\tilde{O}(n/\Delta)

, while traditional sampling algorithms would take time

\Theta((n/\Delta)^2)

. Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {\sc SignificantPageRanks} problem.Comment: Accepted to Internet Mathematics journal for publication. An extended abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time Algorithm for PageRank Computations

arXiv.org e-Print Archive

CiteSeerX

큰 그래프 상에서의 개인화된 페이지 랭크에 대한 빠른 계산 기법

Author: 박성찬
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 이상구.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.그래프 내에서 개인화된 페이지랭크 (P ersonalized P age R ank, PPR 를 계산하는 것은 검색 , 추천 , 지식발견 등 여러 분야에서 광범위하게 활용되는 중요한 작업 이다 . 개인화된 페이지랭크를 계산하는 것은 고비용의 과정이 필요하므로 , 개인화된 페이지랭크를 계산하는 효율적이고 혁신적인 방법들이 다수 개발되어왔다 . 그러나 수백만 이상의 노드를 가진 대용량 그래프에 대한 효율적인 계산은 여전히 해결되지 않은 문제이다 . 그에 더하여 , 기존 제시된 알고리듬들은 그래프 갱신을 효율적으로 다루지 못하여 동적으로 변화하는 그래프를 다루는 데에 한계점이 크다 . 본 연구에서는 높은 정밀도를 보장하고 정밀도를 통제 가능한 , 빠르게 수렴하는 개인화된 페이지랭크 계산 알고리듬을 제시한다 . 전통적인 거듭제곱법 (Power 에 축차가속완화법 (Successive Over Relaxation) 과 초기 추측 값 보정법 (Initial Guess 을 활용한 벡터 재사용 전략을 적용하여 수렴 속도를 개선하였다 . 제시된 방법은 기존 거듭제곱법의 장점인 단순성과 엄밀성을 유지 하면서 도 수렴율과 계산속도를 크게 개선 한다 . 또한 개인화된 페이지랭크 벡터의 갱신을 위하여 이전에 계산 되어 저장된 벡터를 재사용하 여 , 갱신 에 드는 시간이 크게 단축된다 . 본 방법은 주어진 오차 한계에 도달하는 즉시 결과값을 산출하므로 정확도와 계산시간을 유연하게 조절할 수 있으며 이는 표본 기반 추정방법이나 정확한 값을 산출하는 역행렬 기반 방법 이 가지지 못한 특성이다 . 실험 결과 , 본 방법은 거듭제곱법에 비하여 20 배 이상 빠르게 수렴한다는 것이 확인되었으며 , 기 제시된 최고 성능 의 알고리 듬 보다 우수한 성능을 보이는 것 또한 확인되었다1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

SNU Open Repository and Archive

Accelerating Minimal Perfect Hash Function Construction Using GPU Parallelization

Author: Hermann Stefan
Publication venue: Karlsruher Institut für Technologie
Publication date: 15/11/2023
Field of study

Eine Minimale Perfekte Hashfunktion (MPHF) bildet eine Menge von N Schlüsseln kollisionsfrei auf die Menge [N ] := {0, .., N − 1} ab. Diese Thesis leistet einen signifikanten Beitrag für den folgenden generischen MPHF Konstruktionsalgorithmus. Im ersten Schritt werden die Schlüssel in Buckets unterschiedlicher erwarteter Größe verteilt. Wir zeigen, dass die Wahl der erwarteten Bucketgröße ein Optimierungsproblem darstellt welches durch die Euler-Lagrange Gleichung gelöst werden kann. Dies resultiert in eine signifikante Verbesserung im Vergleich zum derzeitigen Stand der Forschung. Im zweiten Schritt werden die Buckets primär in nicht aufsteigender Größe geordnet. Wir zeigen, dass der Platzbedarf verbessert wird wenn Buckets gleicher Größe sekundär in aufsteigender Erwartungsgröße angeordnet werden. Die Buckets werden dann im dritten Schritt in dieser Reihenfolge verarbeitet indem eine Hashfunktion gefunden wird welche alle Schlüssel des Buckets kollisionsfrei auf [N ] abbildet. Abschließend wird für jeden Bucket ein Identifikator der Hashfunktion komprimiert gespeichert. Wir präsentieren eine neue Kompressionstechnik, welche die Identifikatoren in unterschiedliche Enkodierer anordnet, sodass alle Identifikatoren innerhalb eines Enkodierers der gleichen statistischen Verteilung folgen. Dies verbessert die Komprimierbarkeit der Identifikatoren. Wir nutzen die parallele Leistungsfähigkeit von GPUs um die Konstruktion von MPHFs weiter zu beschleunigen. Unsere GPU Implementierung konstruiert eine MPHF mit 1,73 Bits pro Schlüssel in nur 36 ns pro Schlüssel mit einer CPU Abfragezeit von 44 ns. Eine solch geringe Abfragezeit bei gleichzeitig niedrigem Platzbedarf ist nach heutigem Stand, wie z.B. mit PTHash, nicht erreichbar. Eine MPHF, die einen höheren Platzbedarf von 1,88 Bits pro Schlüssel aufweist, wird mit unserer Implementierung 9926 mal schneller konstruiert als durch PTHash. Die meisten unserer Beiträge sind über unsere spezifische Implementierung hinaus anwendbar und können selbst modernste Techniken weiter verbessern

KITopen

Random Walk on Multiple Networks

Author: Bian Yuchen
Huan Jun
Liu Xiao
Luo Dongsheng
Yan Yaowei
Yu Xiong
Zhang Xiang
Publication venue
Publication date: 04/07/2023
Field of study

Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited information. In contrast, real data often contain entities with different types or/and from different sources, which are comprehensive and can be better modeled by multiple networks. To take advantage of rich information in multiple networks and make better inferences on entities, in this study, we propose random walk on multiple networks, RWM. RWM is flexible and supports both multiplex networks and general multiple networks, which may form many-to-many node mappings between networks. RWM sends a random walker on each network to obtain the local proximity (i.e., node visiting probabilities) w.r.t. the starting nodes. Walkers with similar visiting probabilities reinforce each other. We theoretically analyze the convergence properties of RWM. Two approximation methods with theoretical performance guarantees are proposed for efficient computation. We apply RWM in link prediction, network embedding, and local community detection. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of RWM.Comment: Accepted to IEEE TKD

arXiv.org e-Print Archive