5,995 research outputs found

    TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs

    Full text link
    Given a large graph, how can we determine similarity between nodes in a fast and accurate way? Random walk with restart (RWR) is a popular measure for this purpose and has been exploited in numerous data mining applications including ranking, anomaly detection, link prediction, and community detection. However, previous methods for computing exact RWR require prohibitive storage sizes and computational costs, and alternative methods which avoid such costs by computing approximate RWR have limited accuracy. In this paper, we propose TPA, a fast, scalable, and highly accurate method for computing approximate RWR on large graphs. TPA exploits two important properties in RWR: 1) nodes close to a seed node are likely to be revisited in following steps due to block-wise structure of many real-world graphs, and 2) RWR scores of nodes which reside far from the seed node are proportional to their PageRank scores. Based on these two properties, TPA divides approximate RWR problem into two subproblems called neighbor approximation and stranger approximation. In the neighbor approximation, TPA estimates RWR scores of nodes close to the seed based on scores of few early steps from the seed. In the stranger approximation, TPA estimates RWR scores for nodes far from the seed using their PageRank. The stranger and neighbor approximations are conducted in the preprocessing phase and the online phase, respectively. Through extensive experiments, we show that TPA requires up to 3.5x less time with up to 40x less memory space than other state-of-the-art methods for the preprocessing phase. In the online phase, TPA computes approximate RWR up to 30x faster than existing methods while maintaining high accuracy.Comment: 12pages, 10 figure

    Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees

    Full text link
    Given a time-evolving graph, how can we track similarity between nodes in a fast and accurate way, with theoretical guarantees on the convergence and the error? Random Walk with Restart (RWR) is a popular measure to estimate the similarity between nodes and has been exploited in numerous applications. Many real-world graphs are dynamic with frequent insertion/deletion of edges; thus, tracking RWR scores on dynamic graphs in an efficient way has aroused much interest among data mining researchers. Recently, dynamic RWR models based on the propagation of scores across a given graph have been proposed, and have succeeded in outperforming previous other approaches to compute RWR dynamically. However, those models fail to guarantee exactness and convergence time for updating RWR in a generalized form. In this paper, we propose OSP, a fast and accurate algorithm for computing dynamic RWR with insertion/deletion of nodes/edges in a directed/undirected graph. When the graph is updated, OSP first calculates offset scores around the modified edges, propagates the offset scores across the updated graph, and then merges them with the current RWR scores to get updated RWR scores. We prove the exactness of OSP and introduce OSP-T, a version of OSP which regulates a trade-off between accuracy and computation time by using error tolerance {\epsilon}. Given restart probability c, OSP-T guarantees to return RWR scores with O ({\epsilon} /c ) error in O (log ({\epsilon}/2)/log(1-c)) iterations. Through extensive experiments, we show that OSP tracks RWR exactly up to 4605x faster than existing static RWR method on dynamic graphs, and OSP-T requires up to 15x less time with 730x lower L1 norm error and 3.3x lower rank error than other state-of-the-art dynamic RWR methods.Comment: 10 pages, 8 figure

    Supervised Random Walks: Predicting and Recommending Links in Social Networks

    Full text link
    Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction

    The Minimum Wiener Connector

    Full text link
    The Wiener index of a graph is the sum of all pairwise shortest-path distances between its vertices. In this paper we study the novel problem of finding a minimum Wiener connector: given a connected graph G=(V,E)G=(V,E) and a set QโŠ†VQ\subseteq V of query vertices, find a subgraph of GG that connects all query vertices and has minimum Wiener index. We show that The Minimum Wiener Connector admits a polynomial-time (albeit impractical) exact algorithm for the special case where the number of query vertices is bounded. We show that in general the problem is NP-hard, and has no PTAS unless P=NP\mathbf{P} = \mathbf{NP}. Our main contribution is a constant-factor approximation algorithm running in time O~(โˆฃQโˆฃโˆฃEโˆฃ)\widetilde{O}(|Q||E|). A thorough experimentation on a large variety of real-world graphs confirms that our method returns smaller and denser solutions than other methods, and does so by adding to the query set QQ a small number of important vertices (i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Dat

    ํฐ ๊ทธ๋ž˜ํ”„ ์ƒ์—์„œ์˜ ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€ ๋žญํฌ์— ๋Œ€ํ•œ ๋น ๋ฅธ ๊ณ„์‚ฐ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์ด์ƒ๊ตฌ.Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. Because the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thus, severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We improve the convergence rate of traditional Power Iteration method by adopting successive over-relaxation, and initial guess revision, a vector reuse strategy. The proposed method vastly improves on the traditional Power Iteration in terms of convergence rate and computation time, while retaining its simplicity and strictness. Since it can reuse the previously computed vectors for refreshing PPR vectors, its update performance is also greatly enhanced. Also, since the algorithm halts as soon as it reaches a given error threshold, we can flexibly control the trade-off between accuracy and time, a feature lacking in both sampling-based approximation methods and fully exact methods. Experiments show that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms.๊ทธ๋ž˜ํ”„ ๋‚ด์—์„œ ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ (P ersonalized P age R ank, PPR ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ๊ฒ€์ƒ‰ , ์ถ”์ฒœ , ์ง€์‹๋ฐœ๊ฒฌ ๋“ฑ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ํ™œ์šฉ๋˜๋Š” ์ค‘์š”ํ•œ ์ž‘์—… ์ด๋‹ค . ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ๊ณ ๋น„์šฉ์˜ ๊ณผ์ •์ด ํ•„์š”ํ•˜๋ฏ€๋กœ , ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ํšจ์œจ์ ์ด๊ณ  ํ˜์‹ ์ ์ธ ๋ฐฉ๋ฒ•๋“ค์ด ๋‹ค์ˆ˜ ๊ฐœ๋ฐœ๋˜์–ด์™”๋‹ค . ๊ทธ๋Ÿฌ๋‚˜ ์ˆ˜๋ฐฑ๋งŒ ์ด์ƒ์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง„ ๋Œ€์šฉ๋Ÿ‰ ๊ทธ๋ž˜ํ”„์— ๋Œ€ํ•œ ํšจ์œจ์ ์ธ ๊ณ„์‚ฐ์€ ์—ฌ์ „ํžˆ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์€ ๋ฌธ์ œ์ด๋‹ค . ๊ทธ์— ๋”ํ•˜์—ฌ , ๊ธฐ์กด ์ œ์‹œ๋œ ์•Œ๊ณ ๋ฆฌ๋“ฌ๋“ค์€ ๊ทธ๋ž˜ํ”„ ๊ฐฑ์‹ ์„ ํšจ์œจ์ ์œผ๋กœ ๋‹ค๋ฃจ์ง€ ๋ชปํ•˜์—ฌ ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ๊ทธ๋ž˜ํ”„๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ์— ํ•œ๊ณ„์ ์ด ํฌ๋‹ค . ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋†’์€ ์ •๋ฐ€๋„๋ฅผ ๋ณด์žฅํ•˜๊ณ  ์ •๋ฐ€๋„๋ฅผ ํ†ต์ œ ๊ฐ€๋Šฅํ•œ , ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋Š” ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ ๊ณ„์‚ฐ ์•Œ๊ณ ๋ฆฌ๋“ฌ์„ ์ œ์‹œํ•œ๋‹ค . ์ „ํ†ต์ ์ธ ๊ฑฐ๋“ญ์ œ๊ณฑ๋ฒ• (Power ์— ์ถ•์ฐจ๊ฐ€์†์™„ํ™”๋ฒ• (Successive Over Relaxation) ๊ณผ ์ดˆ๊ธฐ ์ถ”์ธก ๊ฐ’ ๋ณด์ •๋ฒ• (Initial Guess ์„ ํ™œ์šฉํ•œ ๋ฒกํ„ฐ ์žฌ์‚ฌ์šฉ ์ „๋žต์„ ์ ์šฉํ•˜์—ฌ ์ˆ˜๋ ด ์†๋„๋ฅผ ๊ฐœ์„ ํ•˜์˜€๋‹ค . ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด ๊ฑฐ๋“ญ์ œ๊ณฑ๋ฒ•์˜ ์žฅ์ ์ธ ๋‹จ์ˆœ์„ฑ๊ณผ ์—„๋ฐ€์„ฑ์„ ์œ ์ง€ ํ•˜๋ฉด์„œ ๋„ ์ˆ˜๋ ด์œจ๊ณผ ๊ณ„์‚ฐ์†๋„๋ฅผ ํฌ๊ฒŒ ๊ฐœ์„  ํ•œ๋‹ค . ๋˜ํ•œ ๊ฐœ์ธํ™”๋œ ํŽ˜์ด์ง€๋žญํฌ ๋ฒกํ„ฐ์˜ ๊ฐฑ์‹ ์„ ์œ„ํ•˜์—ฌ ์ด์ „์— ๊ณ„์‚ฐ ๋˜์–ด ์ €์žฅ๋œ ๋ฒกํ„ฐ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜ ์—ฌ , ๊ฐฑ์‹  ์— ๋“œ๋Š” ์‹œ๊ฐ„์ด ํฌ๊ฒŒ ๋‹จ์ถ•๋œ๋‹ค . ๋ณธ ๋ฐฉ๋ฒ•์€ ์ฃผ์–ด์ง„ ์˜ค์ฐจ ํ•œ๊ณ„์— ๋„๋‹ฌํ•˜๋Š” ์ฆ‰์‹œ ๊ฒฐ๊ณผ๊ฐ’์„ ์‚ฐ์ถœํ•˜๋ฏ€๋กœ ์ •ํ™•๋„์™€ ๊ณ„์‚ฐ์‹œ๊ฐ„์„ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” ํ‘œ๋ณธ ๊ธฐ๋ฐ˜ ์ถ”์ •๋ฐฉ๋ฒ•์ด๋‚˜ ์ •ํ™•ํ•œ ๊ฐ’์„ ์‚ฐ์ถœํ•˜๋Š” ์—ญํ–‰๋ ฌ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• ์ด ๊ฐ€์ง€์ง€ ๋ชปํ•œ ํŠน์„ฑ์ด๋‹ค . ์‹คํ—˜ ๊ฒฐ๊ณผ , ๋ณธ ๋ฐฉ๋ฒ•์€ ๊ฑฐ๋“ญ์ œ๊ณฑ๋ฒ•์— ๋น„ํ•˜์—ฌ 20 ๋ฐฐ ์ด์ƒ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ๊ฒƒ์ด ํ™•์ธ๋˜์—ˆ์œผ๋ฉฐ , ๊ธฐ ์ œ์‹œ๋œ ์ตœ๊ณ  ์„ฑ๋Šฅ ์˜ ์•Œ๊ณ ๋ฆฌ ๋“ฌ ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ ๋˜ํ•œ ํ™•์ธ๋˜์—ˆ๋‹ค1 Introduction 1 2 Preliminaries: Personalized PageRank 4 2.1 Random Walk, PageRank, and Personalized PageRank. 5 2.1.1 Basics on Random Walk 5 2.1.2 PageRank. 6 2.1.3 Personalized PageRank 8 2.2 Characteristics of Personalized PageRank. 9 2.3 Applications of Personalized PageRank. 12 2.4 Previous Work on Personalized PageRank Computation. 17 2.4.1 Basic Algorithms 17 2.4.2 Enhanced Power Iteration 18 2.4.3 Bookmark Coloring Algorithm. 20 2.4.4 Dynamic Programming 21 2.4.5 Monte-Carlo Sampling. 22 2.4.6 Enhanced Direct Solving 24 2.5 Summary 26 3 Personalized PageRank Computation with Initial Guess Revision 30 3.1 Initial Guess Revision and Relaxation 30 3.2 Finding Optimal Weight of Successive Over Relaxation for PPR. 34 3.3 Initial Guess Construction Algorithm for Personalized PageRank. 36 4 Fully Personalized PageRank Algorithm with Initial Guess Revision 42 4.1 FPPR with IGR. 42 4.2 Optimization. 49 4.3 Experiments. 52 5 Personalized PageRank Query Processing with Initial Guess Revision 56 5.1 PPR Query Processing with IGR 56 5.2 Optimization. 64 5.3 Experiments. 67 6 Conclusion 74 Bibliography 77 Appendix 88 Abstract (In Korean) 90Docto

    Adaptive image retrieval using a graph model for semantic feature integration

    Get PDF
    The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)

    Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

    Get PDF
    We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition

    ์‹ค์„ธ๊ณ„ ๊ทธ๋ž˜ํ”„ ํŠน์ง•์„ ํ™œ์šฉํ•œ ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜ ๋Œ€๊ทœ๋ชจ ๊ทธ๋ž˜ํ”„ ๋งˆ์ด๋‹

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ๊ฐ•์œ .Numerous real-world relationships are represented as graphs such as social networks, hyperlink networks, and protein interaction networks. Analyzing those networks is important to understand the real-life phenomena. Among various graph analysis techniques, random walk has been widely used in many applications with satisfactory results. However, various real-world graphs are large and complicated with diverse labels. Traditional random walk based methods require heavy computational cost, and disregards those labels for performing random walks; thus, its utilization has been limited in such large and complicated graphs. In this thesis, I handle the technical challenges of mining large real-world graphs based on random walk. Real-world graphs have distinct structural properties which become a basis to increase the performance of the random walk in terms of speed and quality. Based upon this idea, I develop fast, scalable, and exact methods for node ranking using random walk in large-scale plain networks. I also design accurate models using random walks for node ranking and relational reasoning in labeled graphs such as signed networks and knowledge bases. Through extensive experiments on various real-world graphs, I demonstrate the effectiveness of the methods and models proposed by this thesis. The proposed methods process 100 times larger graphs, and require up to 130 times less memory with up to 9 times faster speed compared to other existing methods, successfully scaling to billion-scale graphs. Also, the proposed models substantially improve the predictive performance of a variety of tasks in labeled graphs such as signed networks and knowledge bases.๋‹ค์–‘ํ•œ ์‹ค์„ธ๊ณ„ ์ž์—ฐ ํ˜„์ƒ์—์„œ์˜ ๊ด€๊ณ„๋“ค์€ ์†Œ์…œ ๋„คํŠธ์›Œํฌ, ํ•˜์ดํผ๋งํฌ ๋„คํŠธ์›Œํฌ์™€ ๋‹จ๋ฐฑ์งˆ ์ƒํ˜ธ์ž‘์šฉ ๋„คํŠธ์›Œํฌ์™€ ๊ฐ™์ด ์ •์ ๊ณผ ๊ฐ„์„œ์˜ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋„คํŠธ์›Œํฌ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์€ ์‹ค์„ธ๊ณ„์˜ ํ˜„์ƒ์„ ์ดํ•ดํ•˜๋Š”๋ฐ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ๋‹ค์–‘ํ•œ ๊ทธ๋ž˜ํ”„ ๋ถ„์„ ๊ธฐ๋ฒ•์ค‘์— ๋žœ๋ค ์›Œํฌ๋ผ๋Š” ๊ธฐ๋ฒ•์ด ๋งŒ์กฑ์Šค๋Ÿฌ์šด ์„ฑ๋Šฅ๊ณผ ํ•จ๊ป˜ ๋งŽ์€ ๊ทธ๋ž˜ํ”„ ๋งˆ์ด๋‹ ์‘์šฉ์— ๋„๋ฆฌ ํ™œ์šฉ๋˜์–ด ์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋Œ€๋‹ค์ˆ˜์˜ ์‹ค์„ธ๊ณ„ ๊ทธ๋ž˜ํ”„๋Š” ๊ทธ ๊ทœ๋ชจ๊ฐ€ ๊ต‰์žฅํžˆ ํฌ๊ณ  ๋‹ค์–‘ํ•œ ๋ผ๋ฒจ ์ •๋ณด์™€ ํ•จ๊ป˜ ๋ณต์žกํ•˜๊ฒŒ ํ‘œํ˜„๋œ๋‹ค. ์ „ํ†ต์ ์ธ ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ฒ•๋“ค์€ ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งŽ์ด ์š”๊ตฌ๋˜๊ณ , ๋žœ๋ค ์›Œํฌ๋ฅผ ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๋‹ค์–‘ํ•œ ๋ผ๋ฒจ ์ •๋ณด๋ฅผ ์ „ํ˜€ ๊ณ ๋ คํ•˜์ง€ ์•Š์•„ ๋ผ๋ฒจ๋กœ ํ‘œํ˜„๋˜๋Š” ๊ทธ๋ž˜ํ”„์˜ ๊ณ ์œ ํ•œ ํŠน์„ฑ์ด ๋ฌด์‹œ๋˜๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ž˜์„œ ์ด์™€ ๊ฐ™์ด ๋ณต์žกํ•˜๋ฉด์„œ ๋Œ€๊ทœ๋ชจ ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ๋žœ๋ค ์›Œํฌ์˜ ์‹ค์งˆ์  ํ™œ์šฉ์ด ์ œํ•œ๋˜์–ด์™”๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋Œ€๊ทœ๋ชจ ์‹ค์„ธ๊ณ„ ๊ทธ๋ž˜ํ”„ ๋ถ„์„์˜ ๊ธฐ์ˆ ์  ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ๋‹ค. ์‹ค์„ธ๊ณ„ ๊ทธ๋ž˜ํ”„๋Š” ๊ณ ์œ ํ•œ ๊ตฌ์กฐ์  ํŠน์ง•๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์  ํŠน์ง•๋“ค์€ ์†๋„์™€ ํ’ˆ์งˆ์˜ ์ธก๋ฉด์—์„œ ๋žœ๋ค ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š”๋ฐ ๊ธฐ๋ฐ˜์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์•„์ด๋””์–ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ๋Œ€๊ทœ๋ชจ์˜ ๋ผ๋ฒจ์ด ์—†๋Š” ์ผ๋ฐ˜์ ์ธ ๋„คํŠธ์›Œํฌ์—์„œ ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๊ฐœ์ธํ™”๋œ ์ •์  ๋žญํ‚น ๊ณ„์‚ฐ์„ ๋น ๋ฅด๊ณ , ํ™•์žฅ์„ฑ ์žˆ๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ๊ตฌํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ๋ถ€ํ˜ธํ™”๋œ ๋„คํŠธ์›Œํฌ ๋˜๋Š” ์ง€์‹ ๋ฒ ์ด์Šค์™€ ๊ฐ™์€ ๋ผ๋ฒจ์ด ์žˆ๋Š” ๊ทธ๋ž˜ํ”„์—์„œ ๊ฐœ์ธํ™”๋œ ์ •์  ๋žญํ‚น๊ณผ ๊ด€๊ณ„ ์ถ”๋ก ์„ ์œ„ํ•œ ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ์‹ค์„ธ๊ณ„ ๊ทธ๋ž˜ํ”„์—์„œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์— ์˜ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๊ณผ ๋ชจ๋ธ์˜ ํšจ๊ณผ์„ฑ์„ ๋ณด์ธ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค๋ฅธ ๊ฒฝ์Ÿ ๊ธฐ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ์ตœ๋Œ€ 100๋ฐฐ ๋” ํฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ณ , ์ตœ๋Œ€ 130๋ฐฐ ์ ๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ, ์ตœ๋Œ€ 9๋ฐฐ ๋น ๋ฅธ ์†๋„๋ฅผ ๋ณด์ด๋ฉฐ, ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ˆ˜ ์‹ญ์–ต ๊ทœ๋ชจ์˜ ๊ทธ๋ž˜ํ”„์—์„œ ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๊ฐœ์ธํ™”๋œ ์ •์  ๋žญํ‚น์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์ œ์•ˆํ•˜๋Š” ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ๋“ค์€ ๋ถ€ํ˜ธํ™”๋œ ๋„คํŠธ์›Œํฌ์™€ ์ง€์‹ ๋ฒ ์ด์Šค์™€ ๊ฐ™์€ ๋ผ๋ฒจ์ด ์žˆ๋Š” ๊ทธ๋ž˜ํ”„์—์„œ ๋ถ€ํ˜ธ ์˜ˆ์ธก, ๊ฐ„์„  ์˜ˆ์ธก, ์ด์ƒ ํ˜„์ƒ ํƒ์ง€, ๊ด€๊ณ„ ์ถ”๋ก  ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ์‘์šฉ์—์„œ ๋‹ค๋ฅธ ๊ฒฝ์Ÿ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ๋” ์ข‹์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.Chapter1 Overview .... 1 1.1 Motivation .... 1 1.2 Research Statement .... 4 1.2.1 Research Goals and Importance .... 4 1.2.2 Technical Challenges .... 6 1.2.3 Main Approaches .... 7 1.2.4 Contributions .... 9 1.2.5 Overall Impact .... 10 1.3 Thesis Organization .... 11 Chapter2 Background .... 12 2.1 Definitions .... 12 2.1.1 Notations on Graphs .... 12 2.1.2 Random Walk with Restart .... 13 2.2 Related Works .... 15 2.2.1 Previous Methods for RWR in Plain Graphs .... 15 2.2.2 Ranking Models in Signed Networks .... 17 2.2.3 Relational Reasoning Models in Edge-labeled Graphs .... 19 Chapter 3 Fast and Scalable Ranking in Large-scale Plain Graphs .... 21 3.1 Introduction .... 21 3.2 Preliminaries .... 23 3.2.1 Iterative Methods for RWR .... 24 3.2.2 Preprocessing Methods for RWR .... 25 3.3 Proposed Method .... 26 3.3.1 Overview .... 26 3.3.2 BePI-B: Exploiting Graph Characteristics for Node Reordering and Block Elimination .... 28 3.3.3 BePI-B: Incorporating an Iterative Method into Block Elimination .... 32 3.3.4 BePI-S: Sparsifying the Schur Complement .... 34 3.3.5 BePI: Preconditioning a Linear System for the Iterative Method .... 36 3.4 Theoretical Results .... 39 3.4.1 Time Complexity .... 39 3.4.2 Space Complexity .... 40 3.4.3 Accuracy Bound .... 41 3.4.4 Lemmas and Proofs .... 43 3.5 Experiments .... 48 3.5.1 Experimental Settings .... 49 3.5.2 Preprocessing Cost .... 51 3.5.3 Query Cost .... 53 3.5.4 Scalability .... 53 3.5.5 Effects of Sparse Schur Complement and Preconditioning .... 54 3.5.6 Effects of the Hub Selection Ratio .... 57 3.5.7 Accuracy .... 58 3.5.8 Comparison with the-State-of-the-Art Method .... 59 3.6 Summary .... 60 Chapter 4 Personalized Ranking in Signed Graphs .... 61 4.1 Introduction .... 61 4.2 Problem Definition .... 65 4.3 Proposed Method .... 65 4.3.1 Signed Random Walk with Restart Model .... 66 4.3.2 SRWR-Iter: Iterative Algorithm for Signed Random Walk with Restart .... 76 4.3.3 SRWR-Pre: Preprocessing Algorithm for Signed Random Walk with Restart .... 82 4.4 Experiments .... 93 4.4.1 Experimental Settings .... 94 4.4.2 Link Prediction Task .... 96 4.4.3 User Preference Preservation Task .... 99 4.4.4 Troll Identification Task .... 100 4.4.5 Sign Prediction Task .... 104 4.4.6 Effectiveness of Balance Attenuation Factors .... 109 4.4.7 Performance of SRWR-Pre .... 110 4.5 Summary .... 113 Chapter 5 Relational Reasoning in Edge-labeled Graphs .... 114 5.1 Introduction .... 114 5.2 Preliminary .... 116 5.3 Proposed Method .... 118 5.3.1 Label Transition Observation .... 120 5.3.2 Learning Label Transition Probabilities .... 121 5.3.3 Multi-Labeled Random Walk with Restart .... 123 5.3.4 Formulation for MuRWR .... 125 5.3.5 Algorithm for MuRWR .... 127 5.4 Theoretical Results .... 131 5.4.1 Lemma for Solution of Label Transition Probabilities and Convexity .... 131 5.4.2 Lemma for Recursive Equation of MuRWR Score Matrix .... 134 5.4.3 Lemma for Spectral Radius in Convergence Theorem .... 136 5.4.4 Lemma for Complexity Analysis .... 137 5.5 Experiment .... 138 5.5.1 Experimental Settings .... 139 5.5.2 Relation Inference Task .... 140 5.5.3 Effects of Label Weights in MuRWR .... 142 5.5.4 Effects of Restart Probability in MuRWR .... 143 5.5.5 Convergence of MuRWR .... 144 5.6 Summary .... 145 Chapter6 Future Works .... 146 6.1 Fast and Accurate Pseudoinverse Computation .... 146 6.2 Fast and Scalable Signed Network Generation .... 147 6.3 Disk-based Algorithms for Random Walk .... 147 Chapter7 Conclusion .... 149 References .... 151 Appendix .... 166 A.1 Hub-and-Spoke Reordering Method .... 166 A.2 Time Complexity of Sparse Matrix Multiplication .... 167 A.3 Details of Preconditioned GMRES .... 167 A.4 Detailed Description of Evaluation Metrics .... 170 A.4.1 Link Prediction .... 170 A.4.2 Troll Identification .... 171 A.5 Discussion on Relative Trustworthiness of SRWR .... 173 Abstract in Korean .... 176Docto
    • โ€ฆ
    corecore