3 research outputs found

    ์Œ๋ณ„ ์ƒ‰ ๊ฐœ์„ ๊ณผ ํšจ์œจ์ ์ธ ๋ฐฑํŠธ๋ž˜ํ‚น์„ ์ด์šฉํ•œ ๋น ๋ฅธ ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊ตฌ๊ฑด๋ชจ.Graph isomorphism is a core problem in graph analysis of various domains including social networks, bioinformatics, chemistry, and so on. As real-world graphs are getting bigger and bigger, applications demand practically fast algorithms that can run on large-scale graphs. Existing approaches, however, show limited performances on large-scale real-world graphs either in time or space. Also, graph isomorphism query processing is often required in many applications, which is a natural generalization of graph isomorphism for multiple graphs. In this thesis we present fast algorithms for graph isomorphism and graph isomorphism query processing. First, we present a new approach to graph isomorphism, which is the framework of pairwise color refinement and efficient backtracking. Within the framework, we introduce three efficient techniques, which together lead to a much faster and scalable algorithm for graph isomorphism. Experiments on real-world datasets show that our algorithm outperforms state-of-the-art solutions by up to several orders of magnitude in terms of running time. Second, We develop an efficient algorithm for graph isomorphism query processing. We use a two-level index using degree sequences and color-label distributions. Experimental results on real datasets show that our algorithm is orders of magnitude faster than the state-of-the-art algorithms in terms of index construction time, and it runs faster than existing algorithms in terms of query processing time as the graph sizes increase.๊ทธ๋ž˜ํ”„ ๋™ํ˜• ๋ฌธ์ œ๋Š” ์†Œ์…œ ๋„คํŠธ์›Œํฌ ์„œ๋น„์Šค, ์ƒ๋ฌผ์ •๋ณดํ•™, ํ™”ํ•™์ •๋ณดํ•™ ๋“ฑ๋“ฑ ๋‹ค์–‘ํ•œ ์‘์šฉ ๋ถ„์•ผ์—์„œ ๊ทธ๋ž˜ํ”„ ๋ถ„์„์„ ์œ„ํ•ด ๋‹ค๋ฃจ๊ณ  ์žˆ๋Š” ํ•ต์‹ฌ ๋ฌธ์ œ์ด๋‹ค. ์‹ค์ƒํ™œ์—์„œ ๋‹ค๋ฃจ๋Š” ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ์ปค์ ธ ๊ฐ์— ๋”ฐ๋ผ, ๋Œ€์šฉ๋Ÿ‰์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ•„์š”์„ฑ์ด ๋†’์•„์ง€๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ ์กด์žฌํ•˜๋Š” ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๋Œ€์šฉ๋Ÿ‰์˜ ๊ทธ๋ž˜ํ”„์— ๋Œ€ํ•ด์„œ ์‹œ๊ฐ„ ํ˜น์€ ๊ณต๊ฐ„ ์ธก๋ฉด์—์„œ ํ•œ๊ณ„๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์‘์šฉ ๋ถ„์•ผ ์ค‘์—์„œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„๋“ค ์ค‘์—์„œ ํ•˜๋‚˜์˜ ์ฟผ๋ฆฌ ๊ทธ๋ž˜ํ”„์™€ ๋™ํ˜•์ธ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ชจ๋‘ ์ฐพ๋Š” ๋ฌธ์ œ, ์ฆ‰ ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์ฟผ๋ฆฌ ํ”„๋กœ์„ธ์‹ฑ์„ ์ข…์ข… ์š”๊ตฌํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋Œ€์šฉ๋Ÿ‰์˜ ์‹ค์ œ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ๋ฌธ์ œ์™€ ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์ฟผ๋ฆฌ ํ”„๋กœ์„ธ์‹ฑ ๋ฌธ์ œ๋ฅผ ๋น ๋ฅด๊ฒŒ ํ‘ธ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ๋ฌธ์ œ๋ฅผ ์œ„ํ•œ ๋น ๋ฅด๊ณ  ํ™•์žฅ์„ฑ ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์Œ๋ณ„ ์ƒ‰ ๊ฐœ์„ (pairwise color refinement)๊ณผ ํšจ์œจ์ ์ธ ๋ฐฑํŠธ๋ž˜ํ‚น์œผ๋กœ ๊ตฌ์„ฑ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ ๋‚ด์—์„œ ์„ธ ๊ฐ€์ง€ ํšจ์œจ์ ์ธ ํ…Œํฌ๋‹‰์„ ์‚ฌ์šฉํ•œ๋‹ค. ์‹ค์ œ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ˜„์กดํ•˜๋Š” ๊ฐ€์žฅ ๋น ๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋ณด๋‹ค ํ‰๊ท  ์ˆ˜์ฒœ ๋ฐฐ ๋น ๋ฆ„์„ ๋ณด์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์ฟผ๋ฆฌ ํ”„๋กœ์„ธ์‹ฑ์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ฐจ์ˆ˜์—ด๊ณผ ์ƒ‰-๋ ˆ์ด๋ธ” ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•œ ์ธ๋ฑ์Šค๋ฅผ ์ด์šฉํ•œ๋‹ค. ์‹ค์ œ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ˜„์กดํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋ณด๋‹ค ์ธ๋ฑ์‹ฑ ์‹œ๊ฐ„์—์„œ๋Š” ํ•ญ์ƒ ํ‰๊ท  ์ˆ˜์ฒœ ๋ฐฐ ๋น ๋ฅด๊ณ , ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์—์„œ๋Š” ์ค‘โ‹…\cdot๋Œ€์šฉ๋Ÿ‰์˜ ๊ทธ๋ž˜ํ”„๋“ค์— ๋Œ€ํ•ด์„œ ํ‰๊ท  ์ˆ˜์‹ญ ๋ฐฐ ๋น ๋ฅด๊ฒŒ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์„ ๋ณด์˜€๋‹ค.1. Introduction 1 1.1. Background 1 1.2. Organization 3 2. Preliminaries 4 2.1. Notation 4 2.2. Problem Definitions 6 2.3. Related Work 7 3. Graph Isomorphism 9 3.1. Algorithm Overview 12 3.2. Pairwise Color Refinement and Binary Cell Mapping 13 3.3. Compressed Candidate Space 16 3.4. Backtracking and Partial Failing Sets 21 3.5. Performance Evaluation 31 3.5.1. Comparing with Existing Solutions 35 3.5.2. Effectiveness of Individual Techniques 39 3.5.3. Analysis with Varying Degrees of Similarity 42 3.5.4. Sensitivity Analysis 46 4. Graph Isomorphism Query Processing 48 4.1. Canonical Coloring 51 4.2. Index Construction 56 4.3. Query Processing 59 4.4. Performance Evaluation 63 4.4.1. Varying Number of Hops 67 4.4.2. Varying Number of Data Graphs 74 5. Conclusion 78 5.1. Summary 78 5.2. Future Directions 79 ์š”์•ฝ 95๋ฐ•

    Efficient Path Enumeration and Structural Clustering on Massive Graphs

    Full text link
    Graph analysis plays a crucial role in understanding the relationships and structures within complex systems. This thesis focuses on addressing fundamental problems in graph analysis, including hop-constrained s-t simple path (HC-s-t path) enumeration, batch HC-s-t path query processing, and graph structural clustering (SCAN). The objective is to develop efficient and scalable distributed algorithms to tackle these challenges, particularly in the context of billion-scale graphs. We first explore the problem of HC-s-t path enumeration. Existing solutions for this problem often suffer from inefficiency and scalability limitations, especially when dealing with billion-scale graphs. To overcome these drawbacks, we propose a novel hybrid search paradigm specifically tailored for HC-s-t path enumeration. This paradigm combines different search strategies to effectively explore the solution space. Building upon this paradigm, we devise a distributed enumeration algorithm that follows a divide-and-conquer strategy, incorporates fruitless exploration pruning, and optimizes memory consumption. Experimental evaluations on various datasets demonstrate that our algorithm achieves a significant speedup compared to existing solutions, even on datasets where they encounter out-of-memory issues. Secondly, we address the problem of batch HC-s-t path query processing. In real-world scenarios, it is common to issue multiple HC-s-t path queries simultaneously and process them as a batch. However, existing solutions often focus on optimizing the processing performance of individual queries, disregarding the benefits of processing queries concurrently. To bridge this gap, we propose the concept of HC-s path queries, which captures the common computation among different queries. We design a two-phase HC-s path query detection algorithm to identify the shared computation for a given set of HC-s-t path queries. Based on the detected HC-s path queries, we develop an efficient HC-s-t path enumeration algorithm that effectively shares the common computation. Extensive experiments on diverse datasets validate the efficiency and scalability of our algorithm for processing multiple HC-s-t path queries concurrently. Thirdly, we investigate the problem of graph structural clustering (SCAN) in billion-scale graphs. Existing distributed solutions for SCAN often lack efficiency or suffer from high memory consumption, making them impractical for large-scale graphs. To overcome these challenges, we propose a fine-grained clustering framework specifically tailored for SCAN. This framework enables effective identification of cohesive subgroups within a graph. Building upon this framework, we devise a distributed SCAN algorithm that minimizes communication overhead and reduces memory consumption throughout the execution. We also incorporate an effective workload balance mechanism that dynamically adjusts to handle skewed workloads. Experimental evaluations on real-world graphs demonstrate the efficiency and scalability of our proposed algorithm. Overall, this thesis contributes novel distributed algorithms for HC-s-t path enumeration, batch HC-s-t path query processing, and graph structural clustering. The proposed algorithms address the efficiency and scalability challenges in graph analysis, particularly on billion-scale graphs. Extensive experimental evaluations validate the superiority of our algorithms compared to existing solutions, enabling efficient and scalable graph analysis in complex systems

    Efficient Graph Isomorphism Query Processing using Degree Sequences and Color-Label Distributions

    No full text
    Given a set of data graphs and a query graph, graph isomorphism query processing is the problem of finding all the data graphs that are isomorphic to the query graph. Graph isomorphism query processing is a core problem in graph analysis of various application domains. In existing approaches, index construction or query processing takes much time as the graph sizes increase. In this paper, we propose an efficient algorithm for graph isomorphism query processing. We introduce the color-label distribution which represents the canonical coloring of a vertex-labeled graph. Based on degree sequences and color-label distributions, we introduce a two-level index, which helps us efficiently solve graph isomorphism query processing. Experimental results on real datasets show that the proposed algorithm is orders of magnitude faster than the state-of-the-art algorithms in terms of index construction time, and it runs faster than existing algorithms in terms of query processing time as the graph sizes increase
    corecore