Search CORE

10 research outputs found

Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees

Author: Jin Woojeong
Kang U
Yoon Minji
Publication venue
Publication date: 02/12/2017
Field of study

Given a time-evolving graph, how can we track similarity between nodes in a fast and accurate way, with theoretical guarantees on the convergence and the error? Random Walk with Restart (RWR) is a popular measure to estimate the similarity between nodes and has been exploited in numerous applications. Many real-world graphs are dynamic with frequent insertion/deletion of edges; thus, tracking RWR scores on dynamic graphs in an efficient way has aroused much interest among data mining researchers. Recently, dynamic RWR models based on the propagation of scores across a given graph have been proposed, and have succeeded in outperforming previous other approaches to compute RWR dynamically. However, those models fail to guarantee exactness and convergence time for updating RWR in a generalized form. In this paper, we propose OSP, a fast and accurate algorithm for computing dynamic RWR with insertion/deletion of nodes/edges in a directed/undirected graph. When the graph is updated, OSP first calculates offset scores around the modified edges, propagates the offset scores across the updated graph, and then merges them with the current RWR scores to get updated RWR scores. We prove the exactness of OSP and introduce OSP-T, a version of OSP which regulates a trade-off between accuracy and computation time by using error tolerance {\epsilon}. Given restart probability c, OSP-T guarantees to return RWR scores with O ({\epsilon} /c ) error in O (log ({\epsilon}/2)/log(1-c)) iterations. Through extensive experiments, we show that OSP tracks RWR exactly up to 4605x faster than existing static RWR method on dynamic graphs, and OSP-T requires up to 15x less time with 730x lower L1 norm error and 3.3x lower rank error than other state-of-the-art dynamic RWR methods.Comment: 10 pages, 8 figure

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

DPPIN: A Biological Dataset of Dynamic Protein-Protein Interaction Networks

Author: Fu Dongqi
He Jingrui
Publication venue
Publication date: 05/07/2021
Field of study

Nowadays, many network representation learning algorithms and downstream network mining tasks have already paid attention to dynamic networks or temporal networks, which are more suitable for real-world complex scenarios by modeling evolving patterns and temporal dependencies between node interactions. Moreover, representing and mining temporal networks have a wide range of applications, such as fraud detection, social network analysis, and drug discovery. To contribute to the network representation learning and network mining research community, in this paper, we generate a new biological dataset of dynamic protein-protein interaction networks (i.e., DPPIN), which consists of twelve dynamic protein-level interaction networks of yeast cells at different scales. We first introduce the generation process of DPPIN. To demonstrate the value of our published dataset DPPIN, we then list the potential applications that would be benefited. Furthermore, we design dynamic local clustering, dynamic spectral clustering, dynamic subgraph matching, dynamic node classification, and dynamic graph classification experiments, where DPPIN indicates future research opportunities for some tasks by presenting challenges on state-of-the-art baseline algorithms. Finally, we identify future directions for improving this dataset utility and welcome inputs from the community. All resources of this work are deployed and publicly available at https://github.com/DongqiFu/DPPIN

arXiv.org e-Print Archive

Temporal walk based centrality metric for graph streams

Author: Benczúr András
Béres Ferenc
Oláh A
Pálovics R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract A plethora of centrality measures or rankings have been proposed to account for the importance of the nodes of a network. In the seminal study of Boldi and Vigna (2014), the comparative evaluation of centrality measures was termed a difficult, arduous task. In networks with fast dynamics, such as the Twitter mention or retweet graphs, predicting emerging centrality is even more challenging. Our main result is a new, temporal walk based dynamic centrality measure that models temporal information propagation by considering the order of edge creation. Dynamic centrality measures have already started to emerge in publications; however, their empirical evaluation is limited. One of our main contributions is creating a quantitative experiment to assess temporal centrality metrics. In this experiment, our new measure outperforms graph snapshot based static and other recently proposed dynamic centrality measures in assigning the highest time-aware centrality to the actually relevant nodes of the network. Additional experiments over different data sets show that our method perform well for detecting concept drift in the process that generates the graphs

SZTAKI Publication Repository

Directory of Open Access Journals

Virtual web for PageRank computing

Author: Song Bo, 1987-
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Dr. Xinhua Zhuang, Dissertation Supervisor.Includes vita.Field of study: Computer science."May 2018."The enormous size and fast-evolving nature of World-Wide-Web has been demanding an even more efficient PageRank updating algorithm. Web evolution may involve two kinds: (1) link structure modification; (2) page insertion/deletion. When the web evolution is restricted to only link insertion/deletion, we demonstrate the benefit of using the previous PageRank to initialize the current PageRank computation, theoretically and experimentally. When page insertion/deletion occurs, how to effectively use the previous PageRank information to facilitate the current PageRank computation has long been a challenge. To tackle the general case, a so-called "virtual web" is introduced through adding the inserted nodes to the previous web along with some specific "in-home" link structure, where in-links from the previous web and out-links to the previous web are excluded. Through the virtual web, we are able to work out a virtual initialization, which can be efficiently used to calculate the current PageRank. The introduced virtual initialization is "unbiased", that assumes least under available knowledge. The virtual web is then integrated with the Power-Iteration and Gauss-Southwell method to solve the node insertion/deletion problem, which are named as Virtual Web Power-Iteration (VWPI) method and Virtual Web Gauss-Southwell (VWGS) method, respectively. Further, we proposed an optimized approach based on VWGS method for updating node insertions. The experiment result shows that the VWGS algorithm significantly outperformed the conventional PageRank computation based on the original model. On the dataset Twitter-2010 with 42M nodes and 1.5B edges, for a perturbation of 400k node and 14 million link insertions plus deletions at one time, our algorithm is about 20 times faster on number of iterations and 3 times faster on running-time in comparison to the Gauss-Southwell method starting from scratch. On the soc-LiveJournal dataset with up to a 20% node insertion, the optimized VWGS method received another 28% gain comparing to the original VWGS method. To compare with the prior work proposed by Ohsaka et al. in [32], our method is 1800x faster per link insertion/deletion on the Twitter-2010 dataset under similar experiment environment.Includes bibliographical references (pages 90-93)

University of Missouri: MOspace

큰 그래프 상에서의 개인화된 페이지 랭크에 대한 빠른 계산 기법

Author: 박성찬
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 이상구.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.그래프 내에서 개인화된 페이지랭크 (P ersonalized P age R ank, PPR 를 계산하는 것은 검색 , 추천 , 지식발견 등 여러 분야에서 광범위하게 활용되는 중요한 작업 이다 . 개인화된 페이지랭크를 계산하는 것은 고비용의 과정이 필요하므로 , 개인화된 페이지랭크를 계산하는 효율적이고 혁신적인 방법들이 다수 개발되어왔다 . 그러나 수백만 이상의 노드를 가진 대용량 그래프에 대한 효율적인 계산은 여전히 해결되지 않은 문제이다 . 그에 더하여 , 기존 제시된 알고리듬들은 그래프 갱신을 효율적으로 다루지 못하여 동적으로 변화하는 그래프를 다루는 데에 한계점이 크다 . 본 연구에서는 높은 정밀도를 보장하고 정밀도를 통제 가능한 , 빠르게 수렴하는 개인화된 페이지랭크 계산 알고리듬을 제시한다 . 전통적인 거듭제곱법 (Power 에 축차가속완화법 (Successive Over Relaxation) 과 초기 추측 값 보정법 (Initial Guess 을 활용한 벡터 재사용 전략을 적용하여 수렴 속도를 개선하였다 . 제시된 방법은 기존 거듭제곱법의 장점인 단순성과 엄밀성을 유지 하면서 도 수렴율과 계산속도를 크게 개선 한다 . 또한 개인화된 페이지랭크 벡터의 갱신을 위하여 이전에 계산 되어 저장된 벡터를 재사용하 여 , 갱신 에 드는 시간이 크게 단축된다 . 본 방법은 주어진 오차 한계에 도달하는 즉시 결과값을 산출하므로 정확도와 계산시간을 유연하게 조절할 수 있으며 이는 표본 기반 추정방법이나 정확한 값을 산출하는 역행렬 기반 방법 이 가지지 못한 특성이다 . 실험 결과 , 본 방법은 거듭제곱법에 비하여 20 배 이상 빠르게 수렴한다는 것이 확인되었으며 , 기 제시된 최고 성능 의 알고리 듬 보다 우수한 성능을 보이는 것 또한 확인되었다1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

SNU Open Repository and Archive

Sequence queries on temporal graphs

Author: Zhu Haohan
Publication venue
Publication date: 21/06/2016
Field of study

Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC

Boston University Institutional Repository (OpenBU)