1,420 research outputs found

    Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees

    Full text link
    Given a time-evolving graph, how can we track similarity between nodes in a fast and accurate way, with theoretical guarantees on the convergence and the error? Random Walk with Restart (RWR) is a popular measure to estimate the similarity between nodes and has been exploited in numerous applications. Many real-world graphs are dynamic with frequent insertion/deletion of edges; thus, tracking RWR scores on dynamic graphs in an efficient way has aroused much interest among data mining researchers. Recently, dynamic RWR models based on the propagation of scores across a given graph have been proposed, and have succeeded in outperforming previous other approaches to compute RWR dynamically. However, those models fail to guarantee exactness and convergence time for updating RWR in a generalized form. In this paper, we propose OSP, a fast and accurate algorithm for computing dynamic RWR with insertion/deletion of nodes/edges in a directed/undirected graph. When the graph is updated, OSP first calculates offset scores around the modified edges, propagates the offset scores across the updated graph, and then merges them with the current RWR scores to get updated RWR scores. We prove the exactness of OSP and introduce OSP-T, a version of OSP which regulates a trade-off between accuracy and computation time by using error tolerance {\epsilon}. Given restart probability c, OSP-T guarantees to return RWR scores with O ({\epsilon} /c ) error in O (log ({\epsilon}/2)/log(1-c)) iterations. Through extensive experiments, we show that OSP tracks RWR exactly up to 4605x faster than existing static RWR method on dynamic graphs, and OSP-T requires up to 15x less time with 730x lower L1 norm error and 3.3x lower rank error than other state-of-the-art dynamic RWR methods.Comment: 10 pages, 8 figure

    큰 κ·Έλž˜ν”„ μƒμ—μ„œμ˜ κ°œμΈν™”λœ νŽ˜μ΄μ§€ λž­ν¬μ— λŒ€ν•œ λΉ λ₯Έ 계산 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2020. 8. 이상ꡬ.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.κ·Έλž˜ν”„ λ‚΄μ—μ„œ κ°œμΈν™”λœ νŽ˜μ΄μ§€λž­ν¬ (P ersonalized P age R ank, PPR λ₯Ό κ³„μ‚°ν•˜λŠ” 것은 검색 , μΆ”μ²œ , μ§€μ‹λ°œκ²¬ λ“± μ—¬λŸ¬ λΆ„μ•Όμ—μ„œ κ΄‘λ²”μœ„ν•˜κ²Œ ν™œμš©λ˜λŠ” μ€‘μš”ν•œ μž‘μ—… 이닀 . κ°œμΈν™”λœ νŽ˜μ΄μ§€λž­ν¬λ₯Ό κ³„μ‚°ν•˜λŠ” 것은 κ³ λΉ„μš©μ˜ 과정이 ν•„μš”ν•˜λ―€λ‘œ , κ°œμΈν™”λœ νŽ˜μ΄μ§€λž­ν¬λ₯Ό κ³„μ‚°ν•˜λŠ” 효율적이고 ν˜μ‹ μ μΈ 방법듀이 λ‹€μˆ˜ κ°œλ°œλ˜μ–΄μ™”λ‹€ . κ·ΈλŸ¬λ‚˜ 수백만 μ΄μƒμ˜ λ…Έλ“œλ₯Ό 가진 λŒ€μš©λŸ‰ κ·Έλž˜ν”„μ— λŒ€ν•œ 효율적인 계산은 μ—¬μ „νžˆ ν•΄κ²°λ˜μ§€ μ•Šμ€ λ¬Έμ œμ΄λ‹€ . 그에 λ”ν•˜μ—¬ , κΈ°μ‘΄ μ œμ‹œλœ μ•Œκ³ λ¦¬λ“¬λ“€μ€ κ·Έλž˜ν”„ 갱신을 효율적으둜 닀루지 λͺ»ν•˜μ—¬ λ™μ μœΌλ‘œ λ³€ν™”ν•˜λŠ” κ·Έλž˜ν”„λ₯Ό λ‹€λ£¨λŠ” 데에 ν•œκ³„μ μ΄ 크닀 . λ³Έ μ—°κ΅¬μ—μ„œλŠ” 높은 정밀도λ₯Ό 보μž₯ν•˜κ³  정밀도λ₯Ό ν†΅μ œ κ°€λŠ₯ν•œ , λΉ λ₯΄κ²Œ μˆ˜λ ΄ν•˜λŠ” κ°œμΈν™”λœ νŽ˜μ΄μ§€λž­ν¬ 계산 μ•Œκ³ λ¦¬λ“¬μ„ μ œμ‹œν•œλ‹€ . 전톡적인 κ±°λ“­μ œκ³±λ²• (Power 에 좕차가속완화법 (Successive Over Relaxation) κ³Ό 초기 μΆ”μΈ‘ κ°’ 보정법 (Initial Guess 을 ν™œμš©ν•œ 벑터 μž¬μ‚¬μš© μ „λž΅μ„ μ μš©ν•˜μ—¬ 수렴 속도λ₯Ό κ°œμ„ ν•˜μ˜€λ‹€ . μ œμ‹œλœ 방법은 κΈ°μ‘΄ κ±°λ“­μ œκ³±λ²•μ˜ μž₯점인 λ‹¨μˆœμ„±κ³Ό 엄밀성을 μœ μ§€ ν•˜λ©΄μ„œ 도 수렴율과 계산속도λ₯Ό 크게 κ°œμ„  ν•œλ‹€ . λ˜ν•œ κ°œμΈν™”λœ νŽ˜μ΄μ§€λž­ν¬ λ²‘ν„°μ˜ 갱신을 μœ„ν•˜μ—¬ 이전에 계산 λ˜μ–΄ μ €μž₯된 벑터λ₯Ό μž¬μ‚¬μš©ν•˜ μ—¬ , κ°±μ‹  에 λ“œλŠ” μ‹œκ°„μ΄ 크게 λ‹¨μΆ•λœλ‹€ . λ³Έ 방법은 주어진 였차 ν•œκ³„μ— λ„λ‹¬ν•˜λŠ” μ¦‰μ‹œ 결과값을 μ‚°μΆœν•˜λ―€λ‘œ 정확도와 κ³„μ‚°μ‹œκ°„μ„ μœ μ—°ν•˜κ²Œ μ‘°μ ˆν•  수 있으며 μ΄λŠ” ν‘œλ³Έ 기반 μΆ”μ •λ°©λ²•μ΄λ‚˜ μ •ν™•ν•œ 값을 μ‚°μΆœν•˜λŠ” μ—­ν–‰λ ¬ 기반 방법 이 가지지 λͺ»ν•œ νŠΉμ„±μ΄λ‹€ . μ‹€ν—˜ κ²°κ³Ό , λ³Έ 방법은 κ±°λ“­μ œκ³±λ²•μ— λΉ„ν•˜μ—¬ 20 λ°° 이상 λΉ λ₯΄κ²Œ μˆ˜λ ΄ν•œλ‹€λŠ” 것이 ν™•μΈλ˜μ—ˆμœΌλ©° , κΈ° μ œμ‹œλœ 졜고 μ„±λŠ₯ 의 μ•Œκ³ λ¦¬ 듬 보닀 μš°μˆ˜ν•œ μ„±λŠ₯을 λ³΄μ΄λŠ” 것 λ˜ν•œ ν™•μΈλ˜μ—ˆλ‹€1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

    Non-Conservative Diffusion and its Application to Social Network Analysis

    Full text link
    The random walk is fundamental to modeling dynamic processes on networks. Metrics based on the random walk have been used in many applications from image processing to Web page ranking. However, how appropriate are random walks to modeling and analyzing social networks? We argue that unlike a random walk, which conserves the quantity diffusing on a network, many interesting social phenomena, such as the spread of information or disease on a social network, are fundamentally non-conservative. When an individual infects her neighbor with a virus, the total amount of infection increases. We classify diffusion processes as conservative and non-conservative and show how these differences impact the choice of metrics used for network analysis, as well as our understanding of network structure and behavior. We show that Alpha-Centrality, which mathematically describes non-conservative diffusion, leads to new insights into the behavior of spreading processes on networks. We give a scalable approximate algorithm for computing the Alpha-Centrality in a massive graph. We validate our approach on real-world online social networks of Digg. We show that a non-conservative metric, such as Alpha-Centrality, produces better agreement with empirical measure of influence than conservative metrics, such as PageRank. We hope that our investigation will inspire further exploration into the realms of conservative and non-conservative metrics in social network analysis

    Fast Exact CoSimRank Search on Evolving and Static Graphs

    Get PDF
    In real Web applications, CoSimRank has been proposed as a powerful measure of node-pair similarity based on graph topologies. However, existing work on CoSimRank is restricted to static graphs. When the graph is updated with new edges arriving over time, it is cost-inhibitive to recompute all CoSimRank scores from scratch, which is impractical. In this study, we propose a fast dynamic scheme, \DCoSim for accurate CoSimRank search over evolving graphs. Based on \DCoSim, we also propose a fast scheme, \FCoSim, that greatly accelerates CoSimRank search over static graphs. Our theoretical analysis shows that \DCoSim and \FCoSim guarantee the exactness of CoSimRank scores. On the static graph G, to efficiently retrieve CoSimRank scores \mathbfS , \FCoSim is based on three ideas: (i) It first finds a "spanning polytree»» T over G. (ii) On T, a fast algorithm is designed to compute the CoSimRank scores \mathbfS (T) over the "spanning polytree»» T. (iii) On G, \DCoSim is employed to compute the changes of \mathbfS (T) in response to the delta graph (GøminusT)(G øminus T). Experimental evaluations verify the superiority of \DCoSim over evolving graphs, and the fast speedup of \FCoSim on large-scale static graphs against its competitors, without any loss of accuracy

    싀세계 κ·Έλž˜ν”„ νŠΉμ§•μ„ ν™œμš©ν•œ 랜덀 μ›Œν¬ 기반 λŒ€κ·œλͺ¨ κ·Έλž˜ν”„ λ§ˆμ΄λ‹

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2020. 2. κ°•μœ .Numerous real-world relationships are represented as graphs such as social networks, hyperlink networks, and protein interaction networks. Analyzing those networks is important to understand the real-life phenomena. Among various graph analysis techniques, random walk has been widely used in many applications with satisfactory results. However, various real-world graphs are large and complicated with diverse labels. Traditional random walk based methods require heavy computational cost, and disregards those labels for performing random walks; thus, its utilization has been limited in such large and complicated graphs. In this thesis, I handle the technical challenges of mining large real-world graphs based on random walk. Real-world graphs have distinct structural properties which become a basis to increase the performance of the random walk in terms of speed and quality. Based upon this idea, I develop fast, scalable, and exact methods for node ranking using random walk in large-scale plain networks. I also design accurate models using random walks for node ranking and relational reasoning in labeled graphs such as signed networks and knowledge bases. Through extensive experiments on various real-world graphs, I demonstrate the effectiveness of the methods and models proposed by this thesis. The proposed methods process 100 times larger graphs, and require up to 130 times less memory with up to 9 times faster speed compared to other existing methods, successfully scaling to billion-scale graphs. Also, the proposed models substantially improve the predictive performance of a variety of tasks in labeled graphs such as signed networks and knowledge bases.λ‹€μ–‘ν•œ 싀세계 μžμ—° ν˜„μƒμ—μ„œμ˜ 관계듀은 μ†Œμ…œ λ„€νŠΈμ›Œν¬, ν•˜μ΄νΌλ§ν¬ λ„€νŠΈμ›Œν¬μ™€ λ‹¨λ°±μ§ˆ μƒν˜Έμž‘μš© λ„€νŠΈμ›Œν¬μ™€ 같이 정점과 κ°„μ„œμ˜ κ·Έλž˜ν”„λ‘œ ν‘œν˜„λœλ‹€. μ΄λŸ¬ν•œ λ„€νŠΈμ›Œν¬λ₯Ό λΆ„μ„ν•˜λŠ” 것은 μ‹€μ„Έκ³„μ˜ ν˜„μƒμ„ μ΄ν•΄ν•˜λŠ”λ° 맀우 μ€‘μš”ν•˜λ‹€. λ‹€μ–‘ν•œ κ·Έλž˜ν”„ 뢄석 기법쀑에 랜덀 μ›Œν¬λΌλŠ” 기법이 만쑱슀러운 μ„±λŠ₯κ³Ό ν•¨κ»˜ λ§Žμ€ κ·Έλž˜ν”„ λ§ˆμ΄λ‹ μ‘μš©μ— 널리 ν™œμš©λ˜μ–΄ μ™”λ‹€. κ·ΈλŸ¬λ‚˜ λŒ€λ‹€μˆ˜μ˜ 싀세계 κ·Έλž˜ν”„λŠ” κ·Έ 규λͺ¨κ°€ ꡉμž₯히 크고 λ‹€μ–‘ν•œ 라벨 정보와 ν•¨κ»˜ λ³΅μž‘ν•˜κ²Œ ν‘œν˜„λœλ‹€. 전톡적인 랜덀 μ›Œν¬ 기반의 기법듀은 κ³„μ‚°λŸ‰μ΄ 많이 μš”κ΅¬λ˜κ³ , 랜덀 μ›Œν¬λ₯Ό ν•˜λŠ”λ° μžˆμ–΄μ„œ λ‹€μ–‘ν•œ 라벨 정보λ₯Ό μ „ν˜€ κ³ λ €ν•˜μ§€ μ•Šμ•„ 라벨둜 ν‘œν˜„λ˜λŠ” κ·Έλž˜ν”„μ˜ κ³ μœ ν•œ νŠΉμ„±μ΄ λ¬΄μ‹œλ˜κ²Œ λœλ‹€. κ·Έλž˜μ„œ 이와 같이 λ³΅μž‘ν•˜λ©΄μ„œ λŒ€κ·œλͺ¨ κ·Έλž˜ν”„μ—μ„œλŠ” 랜덀 μ›Œν¬μ˜ μ‹€μ§ˆμ  ν™œμš©μ΄ μ œν•œλ˜μ–΄μ™”λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” 랜덀 μ›Œν¬ 기반의 λŒ€κ·œλͺ¨ 싀세계 κ·Έλž˜ν”„ λΆ„μ„μ˜ 기술적 ν•œκ³„λ₯Ό ν•΄κ²°ν•˜κ³ μž ν•œλ‹€. 싀세계 κ·Έλž˜ν”„λŠ” κ³ μœ ν•œ ꡬ쑰적 νŠΉμ§•λ“€μ„ 가지고 있으며 μ΄λŸ¬ν•œ ꡬ쑰적 νŠΉμ§•λ“€μ€ 속도와 ν’ˆμ§ˆμ˜ μΈ‘λ©΄μ—μ„œ 랜덀 μ›Œν¬μ˜ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ”λ° 기반이 될 수 μžˆλ‹€. μ΄λŸ¬ν•œ 아이디어λ₯Ό ν™œμš©ν•˜μ—¬, λŒ€κ·œλͺ¨μ˜ 라벨이 μ—†λŠ” 일반적인 λ„€νŠΈμ›Œν¬μ—μ„œ 랜덀 μ›Œν¬ 기반의 κ°œμΈν™”λœ 정점 λž­ν‚Ή 계산을 λΉ λ₯΄κ³ , ν™•μž₯μ„± 있고 μ •ν™•ν•˜κ²Œ κ΅¬ν•˜λŠ” 기법을 μ œμ•ˆν•œλ‹€. λ˜ν•œ λΆ€ν˜Έν™”λœ λ„€νŠΈμ›Œν¬ λ˜λŠ” 지식 λ² μ΄μŠ€μ™€ 같은 라벨이 μžˆλŠ” κ·Έλž˜ν”„μ—μ„œ κ°œμΈν™”λœ 정점 λž­ν‚Ήκ³Ό 관계 좔둠을 μœ„ν•œ 랜덀 μ›Œν¬ 기반의 λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. λ‹€μ–‘ν•œ 싀세계 κ·Έλž˜ν”„μ—μ„œ κ΄‘λ²”μœ„ν•œ μ‹€ν—˜μ„ 톡해 λ³Έ ν•™μœ„ 논문에 μ˜ν•΄ μ œμ•ˆλœ 방법과 λͺ¨λΈμ˜ νš¨κ³Όμ„±μ„ 보인닀. μ œμ•ˆν•˜λŠ” 방법은 λ‹€λ₯Έ 경쟁 기법듀과 λΉ„κ΅ν–ˆμ„ λ•Œ μ΅œλŒ€ 100λ°° 더 큰 κ·Έλž˜ν”„λ₯Ό μ²˜λ¦¬ν•  수 있고, μ΅œλŒ€ 130λ°° 적게 λ©”λͺ¨λ¦¬λ₯Ό μ‚¬μš©ν•˜λ©΄μ„œ, μ΅œλŒ€ 9λ°° λΉ λ₯Έ 속도λ₯Ό 보이며, 결과적으둜 수 μ‹­μ–΅ 규λͺ¨μ˜ κ·Έλž˜ν”„μ—μ„œ 랜덀 μ›Œν¬ 기반의 κ°œμΈν™”λœ 정점 λž­ν‚Ήμ„ μ„±κ³΅μ μœΌλ‘œ ꡬ할 수 μžˆλ‹€. λ˜ν•œ, μ œμ•ˆν•˜λŠ” 랜덀 μ›Œν¬ 기반의 λͺ¨λΈλ“€μ€ λΆ€ν˜Έν™”λœ λ„€νŠΈμ›Œν¬μ™€ 지식 λ² μ΄μŠ€μ™€ 같은 라벨이 μžˆλŠ” κ·Έλž˜ν”„μ—μ„œ λΆ€ν˜Έ 예츑, κ°„μ„  예츑, 이상 ν˜„μƒ 탐지, 관계 μΆ”λ‘  λ“±μ˜ λ‹€μ–‘ν•œ μ‘μš©μ—μ„œ λ‹€λ₯Έ 경쟁 λͺ¨λΈλ“€λ³΄λ‹€ 더 쒋은 예츑 μ„±λŠ₯을 보인닀.Chapter1 Overview .... 1 1.1 Motivation .... 1 1.2 Research Statement .... 4 1.2.1 Research Goals and Importance .... 4 1.2.2 Technical Challenges .... 6 1.2.3 Main Approaches .... 7 1.2.4 Contributions .... 9 1.2.5 Overall Impact .... 10 1.3 Thesis Organization .... 11 Chapter2 Background .... 12 2.1 Definitions .... 12 2.1.1 Notations on Graphs .... 12 2.1.2 Random Walk with Restart .... 13 2.2 Related Works .... 15 2.2.1 Previous Methods for RWR in Plain Graphs .... 15 2.2.2 Ranking Models in Signed Networks .... 17 2.2.3 Relational Reasoning Models in Edge-labeled Graphs .... 19 Chapter 3 Fast and Scalable Ranking in Large-scale Plain Graphs .... 21 3.1 Introduction .... 21 3.2 Preliminaries .... 23 3.2.1 Iterative Methods for RWR .... 24 3.2.2 Preprocessing Methods for RWR .... 25 3.3 Proposed Method .... 26 3.3.1 Overview .... 26 3.3.2 BePI-B: Exploiting Graph Characteristics for Node Reordering and Block Elimination .... 28 3.3.3 BePI-B: Incorporating an Iterative Method into Block Elimination .... 32 3.3.4 BePI-S: Sparsifying the Schur Complement .... 34 3.3.5 BePI: Preconditioning a Linear System for the Iterative Method .... 36 3.4 Theoretical Results .... 39 3.4.1 Time Complexity .... 39 3.4.2 Space Complexity .... 40 3.4.3 Accuracy Bound .... 41 3.4.4 Lemmas and Proofs .... 43 3.5 Experiments .... 48 3.5.1 Experimental Settings .... 49 3.5.2 Preprocessing Cost .... 51 3.5.3 Query Cost .... 53 3.5.4 Scalability .... 53 3.5.5 Effects of Sparse Schur Complement and Preconditioning .... 54 3.5.6 Effects of the Hub Selection Ratio .... 57 3.5.7 Accuracy .... 58 3.5.8 Comparison with the-State-of-the-Art Method .... 59 3.6 Summary .... 60 Chapter 4 Personalized Ranking in Signed Graphs .... 61 4.1 Introduction .... 61 4.2 Problem Definition .... 65 4.3 Proposed Method .... 65 4.3.1 Signed Random Walk with Restart Model .... 66 4.3.2 SRWR-Iter: Iterative Algorithm for Signed Random Walk with Restart .... 76 4.3.3 SRWR-Pre: Preprocessing Algorithm for Signed Random Walk with Restart .... 82 4.4 Experiments .... 93 4.4.1 Experimental Settings .... 94 4.4.2 Link Prediction Task .... 96 4.4.3 User Preference Preservation Task .... 99 4.4.4 Troll Identification Task .... 100 4.4.5 Sign Prediction Task .... 104 4.4.6 Effectiveness of Balance Attenuation Factors .... 109 4.4.7 Performance of SRWR-Pre .... 110 4.5 Summary .... 113 Chapter 5 Relational Reasoning in Edge-labeled Graphs .... 114 5.1 Introduction .... 114 5.2 Preliminary .... 116 5.3 Proposed Method .... 118 5.3.1 Label Transition Observation .... 120 5.3.2 Learning Label Transition Probabilities .... 121 5.3.3 Multi-Labeled Random Walk with Restart .... 123 5.3.4 Formulation for MuRWR .... 125 5.3.5 Algorithm for MuRWR .... 127 5.4 Theoretical Results .... 131 5.4.1 Lemma for Solution of Label Transition Probabilities and Convexity .... 131 5.4.2 Lemma for Recursive Equation of MuRWR Score Matrix .... 134 5.4.3 Lemma for Spectral Radius in Convergence Theorem .... 136 5.4.4 Lemma for Complexity Analysis .... 137 5.5 Experiment .... 138 5.5.1 Experimental Settings .... 139 5.5.2 Relation Inference Task .... 140 5.5.3 Effects of Label Weights in MuRWR .... 142 5.5.4 Effects of Restart Probability in MuRWR .... 143 5.5.5 Convergence of MuRWR .... 144 5.6 Summary .... 145 Chapter6 Future Works .... 146 6.1 Fast and Accurate Pseudoinverse Computation .... 146 6.2 Fast and Scalable Signed Network Generation .... 147 6.3 Disk-based Algorithms for Random Walk .... 147 Chapter7 Conclusion .... 149 References .... 151 Appendix .... 166 A.1 Hub-and-Spoke Reordering Method .... 166 A.2 Time Complexity of Sparse Matrix Multiplication .... 167 A.3 Details of Preconditioned GMRES .... 167 A.4 Detailed Description of Evaluation Metrics .... 170 A.4.1 Link Prediction .... 170 A.4.2 Troll Identification .... 171 A.5 Discussion on Relative Trustworthiness of SRWR .... 173 Abstract in Korean .... 176Docto

    Quick Detection of High-degree Entities in Large Directed Networks

    Get PDF
    In this paper, we address the problem of quick detection of high-degree entities in large online social networks. Practical importance of this problem is attested by a large number of companies that continuously collect and update statistics about popular entities, usually using the degree of an entity as an approximation of its popularity. We suggest a simple, efficient, and easy to implement two-stage randomized algorithm that provides highly accurate solutions for this problem. For instance, our algorithm needs only one thousand API requests in order to find the top-100 most followed users in Twitter, a network with approximately a billion of registered users, with more than 90% precision. Our algorithm significantly outperforms existing methods and serves many different purposes, such as finding the most popular users or the most popular interest groups in social networks. An important contribution of this work is the analysis of the proposed algorithm using Extreme Value Theory -- a branch of probability that studies extreme events and properties of largest order statistics in random samples. Using this theory, we derive an accurate prediction for the algorithm's performance and show that the number of API requests for finding the top-k most popular entities is sublinear in the number of entities. Moreover, we formally show that the high variability among the entities, expressed through heavy-tailed distributions, is the reason for the algorithm's efficiency. We quantify this phenomenon in a rigorous mathematical way
    • …
    corecore