1 research outputs found

    μŒλ³„ 색 κ°œμ„ κ³Ό 효율적인 λ°±νŠΈλž˜ν‚Ήμ„ μ΄μš©ν•œ λΉ λ₯Έ κ·Έλž˜ν”„ λ™ν˜• μ•Œκ³ λ¦¬μ¦˜

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2021.8. ꡬ건λͺ¨.Graph isomorphism is a core problem in graph analysis of various domains including social networks, bioinformatics, chemistry, and so on. As real-world graphs are getting bigger and bigger, applications demand practically fast algorithms that can run on large-scale graphs. Existing approaches, however, show limited performances on large-scale real-world graphs either in time or space. Also, graph isomorphism query processing is often required in many applications, which is a natural generalization of graph isomorphism for multiple graphs. In this thesis we present fast algorithms for graph isomorphism and graph isomorphism query processing. First, we present a new approach to graph isomorphism, which is the framework of pairwise color refinement and efficient backtracking. Within the framework, we introduce three efficient techniques, which together lead to a much faster and scalable algorithm for graph isomorphism. Experiments on real-world datasets show that our algorithm outperforms state-of-the-art solutions by up to several orders of magnitude in terms of running time. Second, We develop an efficient algorithm for graph isomorphism query processing. We use a two-level index using degree sequences and color-label distributions. Experimental results on real datasets show that our algorithm is orders of magnitude faster than the state-of-the-art algorithms in terms of index construction time, and it runs faster than existing algorithms in terms of query processing time as the graph sizes increase.κ·Έλž˜ν”„ λ™ν˜• λ¬Έμ œλŠ” μ†Œμ…œ λ„€νŠΈμ›Œν¬ μ„œλΉ„μŠ€, 생물정보학, 화학정보학 λ“±λ“± λ‹€μ–‘ν•œ μ‘μš© λΆ„μ•Όμ—μ„œ κ·Έλž˜ν”„ 뢄석을 μœ„ν•΄ 닀루고 μžˆλŠ” 핡심 λ¬Έμ œμ΄λ‹€. μ‹€μƒν™œμ—μ„œ λ‹€λ£¨λŠ” κ·Έλž˜ν”„ λ°μ΄ν„°μ˜ 크기가 컀져 감에 따라, λŒ€μš©λŸ‰μ˜ κ·Έλž˜ν”„λ₯Ό μ²˜λ¦¬ν•  수 μžˆλŠ” κ·Έλž˜ν”„ λ™ν˜• μ•Œκ³ λ¦¬μ¦˜μ˜ ν•„μš”μ„±μ΄ 높아지고 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ ν˜„μž¬ μ‘΄μž¬ν•˜λŠ” κ·Έλž˜ν”„ λ™ν˜• μ•Œκ³ λ¦¬μ¦˜λ“€μ€ λŒ€μš©λŸ‰μ˜ κ·Έλž˜ν”„μ— λŒ€ν•΄μ„œ μ‹œκ°„ ν˜Ήμ€ 곡간 μΈ‘λ©΄μ—μ„œ ν•œκ³„λ₯Ό 보여쀀닀. μ‘μš© λΆ„μ•Ό μ€‘μ—μ„œλŠ” μ—¬λŸ¬ 개의 κ·Έλž˜ν”„λ“€ μ€‘μ—μ„œ ν•˜λ‚˜μ˜ 쿼리 κ·Έλž˜ν”„μ™€ λ™ν˜•μΈ κ·Έλž˜ν”„λ₯Ό λͺ¨λ‘ μ°ΎλŠ” 문제, 즉 κ·Έλž˜ν”„ λ™ν˜• 쿼리 ν”„λ‘œμ„Έμ‹±μ„ μ’…μ’… μš”κ΅¬ν•˜κΈ°λ„ ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λŒ€μš©λŸ‰μ˜ μ‹€μ œ κ·Έλž˜ν”„ 데이터에 λŒ€ν•΄μ„œ κ·Έλž˜ν”„ λ™ν˜• λ¬Έμ œμ™€ κ·Έλž˜ν”„ λ™ν˜• 쿼리 ν”„λ‘œμ„Έμ‹± 문제λ₯Ό λΉ λ₯΄κ²Œ ν‘ΈλŠ” μ•Œκ³ λ¦¬μ¦˜λ“€μ„ μ œμ•ˆν•œλ‹€. 첫 번째둜, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” κ·Έλž˜ν”„ λ™ν˜• 문제λ₯Ό μœ„ν•œ λΉ λ₯΄κ³  ν™•μž₯μ„± μžˆλŠ” μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. 이λ₯Ό μœ„ν•΄ μŒλ³„ 색 κ°œμ„ (pairwise color refinement)κ³Ό 효율적인 λ°±νŠΈλž˜ν‚ΉμœΌλ‘œ κ΅¬μ„±λœ ν”„λ ˆμž„μ›Œν¬λ₯Ό μ†Œκ°œν•œλ‹€. 이 ν”„λ ˆμž„μ›Œν¬ λ‚΄μ—μ„œ μ„Έ 가지 효율적인 ν…Œν¬λ‹‰μ„ μ‚¬μš©ν•œλ‹€. μ‹€μ œ κ·Έλž˜ν”„ 데이터에 λŒ€ν•œ μ‹€ν—˜μ„ 톡해 λ³Έ μ•Œκ³ λ¦¬μ¦˜μ΄ ν˜„μ‘΄ν•˜λŠ” κ°€μž₯ λΉ λ₯Έ μ•Œκ³ λ¦¬μ¦˜λ“€λ³΄λ‹€ 평균 수천 λ°° 빠름을 λ³΄μ˜€λ‹€. 두 번째둜, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” κ·Έλž˜ν”„ λ™ν˜• 쿼리 ν”„λ‘œμ„Έμ‹±μ„ μœ„ν•œ 효율적인 μ•Œκ³ λ¦¬μ¦˜μ„ κ°œλ°œν•œλ‹€. λ³Έ μ•Œκ³ λ¦¬μ¦˜μ€ μ°¨μˆ˜μ—΄κ³Ό 색-λ ˆμ΄λΈ” 뢄포λ₯Ό μ΄μš©ν•œ 인덱슀λ₯Ό μ΄μš©ν•œλ‹€. μ‹€μ œ κ·Έλž˜ν”„ 데이터에 λŒ€ν•œ μ‹€ν—˜μ„ 톡해 λ³Έ μ•Œκ³ λ¦¬μ¦˜μ΄ ν˜„μ‘΄ν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜λ“€λ³΄λ‹€ 인덱싱 μ‹œκ°„μ—μ„œλŠ” 항상 평균 수천 λ°° λΉ λ₯΄κ³ , 쿼리 처리 μ‹œκ°„μ—μ„œλŠ” 쀑⋅\cdotλŒ€μš©λŸ‰μ˜ κ·Έλž˜ν”„λ“€μ— λŒ€ν•΄μ„œ 평균 μˆ˜μ‹­ λ°° λΉ λ₯΄κ²Œ λ™μž‘ν•˜λŠ” 것을 λ³΄μ˜€λ‹€.1. Introduction 1 1.1. Background 1 1.2. Organization 3 2. Preliminaries 4 2.1. Notation 4 2.2. Problem Definitions 6 2.3. Related Work 7 3. Graph Isomorphism 9 3.1. Algorithm Overview 12 3.2. Pairwise Color Refinement and Binary Cell Mapping 13 3.3. Compressed Candidate Space 16 3.4. Backtracking and Partial Failing Sets 21 3.5. Performance Evaluation 31 3.5.1. Comparing with Existing Solutions 35 3.5.2. Effectiveness of Individual Techniques 39 3.5.3. Analysis with Varying Degrees of Similarity 42 3.5.4. Sensitivity Analysis 46 4. Graph Isomorphism Query Processing 48 4.1. Canonical Coloring 51 4.2. Index Construction 56 4.3. Query Processing 59 4.4. Performance Evaluation 63 4.4.1. Varying Number of Hops 67 4.4.2. Varying Number of Data Graphs 74 5. Conclusion 78 5.1. Summary 78 5.2. Future Directions 79 μš”μ•½ 95λ°•
    corecore