Search CORE

3 research outputs found

쌍별 색 개선과 효율적인 백트래킹을 이용한 빠른 그래프 동형 알고리즘

Author: 구건모
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 구건모.Graph isomorphism is a core problem in graph analysis of various domains including social networks, bioinformatics, chemistry, and so on. As real-world graphs are getting bigger and bigger, applications demand practically fast algorithms that can run on large-scale graphs. Existing approaches, however, show limited performances on large-scale real-world graphs either in time or space. Also, graph isomorphism query processing is often required in many applications, which is a natural generalization of graph isomorphism for multiple graphs. In this thesis we present fast algorithms for graph isomorphism and graph isomorphism query processing. First, we present a new approach to graph isomorphism, which is the framework of pairwise color refinement and efficient backtracking. Within the framework, we introduce three efficient techniques, which together lead to a much faster and scalable algorithm for graph isomorphism. Experiments on real-world datasets show that our algorithm outperforms state-of-the-art solutions by up to several orders of magnitude in terms of running time. Second, We develop an efficient algorithm for graph isomorphism query processing. We use a two-level index using degree sequences and color-label distributions. Experimental results on real datasets show that our algorithm is orders of magnitude faster than the state-of-the-art algorithms in terms of index construction time, and it runs faster than existing algorithms in terms of query processing time as the graph sizes increase.그래프 동형 문제는 소셜 네트워크 서비스, 생물정보학, 화학정보학 등등 다양한 응용 분야에서 그래프 분석을 위해 다루고 있는 핵심 문제이다. 실생활에서 다루는 그래프 데이터의 크기가 커져 감에 따라, 대용량의 그래프를 처리할 수 있는 그래프 동형 알고리즘의 필요성이 높아지고 있다. 그러나 현재 존재하는 그래프 동형 알고리즘들은 대용량의 그래프에 대해서 시간 혹은 공간 측면에서 한계를 보여준다. 응용 분야 중에서는 여러 개의 그래프들 중에서 하나의 쿼리 그래프와 동형인 그래프를 모두 찾는 문제, 즉 그래프 동형 쿼리 프로세싱을 종종 요구하기도 한다. 본 논문에서는 대용량의 실제 그래프 데이터에 대해서 그래프 동형 문제와 그래프 동형 쿼리 프로세싱 문제를 빠르게 푸는 알고리즘들을 제안한다. 첫 번째로, 본 논문에서는 그래프 동형 문제를 위한 빠르고 확장성 있는 알고리즘을 제안한다. 이를 위해 쌍별 색 개선(pairwise color refinement)과 효율적인 백트래킹으로 구성된 프레임워크를 소개한다. 이 프레임워크 내에서 세 가지 효율적인 테크닉을 사용한다. 실제 그래프 데이터에 대한 실험을 통해 본 알고리즘이 현존하는 가장 빠른 알고리즘들보다 평균 수천 배 빠름을 보였다. 두 번째로, 본 논문에서는 그래프 동형 쿼리 프로세싱을 위한 효율적인 알고리즘을 개발한다. 본 알고리즘은 차수열과 색-레이블 분포를 이용한 인덱스를 이용한다. 실제 그래프 데이터에 대한 실험을 통해 본 알고리즘이 현존하는 알고리즘들보다 인덱싱 시간에서는 항상 평균 수천 배 빠르고, 쿼리 처리 시간에서는 중

\cdot

대용량의 그래프들에 대해서 평균 수십 배 빠르게 동작하는 것을 보였다.1. Introduction 1 1.1. Background 1 1.2. Organization 3 2. Preliminaries 4 2.1. Notation 4 2.2. Problem Definitions 6 2.3. Related Work 7 3. Graph Isomorphism 9 3.1. Algorithm Overview 12 3.2. Pairwise Color Refinement and Binary Cell Mapping 13 3.3. Compressed Candidate Space 16 3.4. Backtracking and Partial Failing Sets 21 3.5. Performance Evaluation 31 3.5.1. Comparing with Existing Solutions 35 3.5.2. Effectiveness of Individual Techniques 39 3.5.3. Analysis with Varying Degrees of Similarity 42 3.5.4. Sensitivity Analysis 46 4. Graph Isomorphism Query Processing 48 4.1. Canonical Coloring 51 4.2. Index Construction 56 4.3. Query Processing 59 4.4. Performance Evaluation 63 4.4.1. Varying Number of Hops 67 4.4.2. Varying Number of Data Graphs 74 5. Conclusion 78 5.1. Summary 78 5.2. Future Directions 79 요약 95박

SNU Open Repository and Archive

Efficient Path Enumeration and Structural Clustering on Massive Graphs

Author: Hao Kongzhang
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

Graph analysis plays a crucial role in understanding the relationships and structures within complex systems. This thesis focuses on addressing fundamental problems in graph analysis, including hop-constrained s-t simple path (HC-s-t path) enumeration, batch HC-s-t path query processing, and graph structural clustering (SCAN). The objective is to develop efficient and scalable distributed algorithms to tackle these challenges, particularly in the context of billion-scale graphs. We first explore the problem of HC-s-t path enumeration. Existing solutions for this problem often suffer from inefficiency and scalability limitations, especially when dealing with billion-scale graphs. To overcome these drawbacks, we propose a novel hybrid search paradigm specifically tailored for HC-s-t path enumeration. This paradigm combines different search strategies to effectively explore the solution space. Building upon this paradigm, we devise a distributed enumeration algorithm that follows a divide-and-conquer strategy, incorporates fruitless exploration pruning, and optimizes memory consumption. Experimental evaluations on various datasets demonstrate that our algorithm achieves a significant speedup compared to existing solutions, even on datasets where they encounter out-of-memory issues. Secondly, we address the problem of batch HC-s-t path query processing. In real-world scenarios, it is common to issue multiple HC-s-t path queries simultaneously and process them as a batch. However, existing solutions often focus on optimizing the processing performance of individual queries, disregarding the benefits of processing queries concurrently. To bridge this gap, we propose the concept of HC-s path queries, which captures the common computation among different queries. We design a two-phase HC-s path query detection algorithm to identify the shared computation for a given set of HC-s-t path queries. Based on the detected HC-s path queries, we develop an efficient HC-s-t path enumeration algorithm that effectively shares the common computation. Extensive experiments on diverse datasets validate the efficiency and scalability of our algorithm for processing multiple HC-s-t path queries concurrently. Thirdly, we investigate the problem of graph structural clustering (SCAN) in billion-scale graphs. Existing distributed solutions for SCAN often lack efficiency or suffer from high memory consumption, making them impractical for large-scale graphs. To overcome these challenges, we propose a fine-grained clustering framework specifically tailored for SCAN. This framework enables effective identification of cohesive subgroups within a graph. Building upon this framework, we devise a distributed SCAN algorithm that minimizes communication overhead and reduces memory consumption throughout the execution. We also incorporate an effective workload balance mechanism that dynamically adjusts to handle skewed workloads. Experimental evaluations on real-world graphs demonstrate the efficiency and scalability of our proposed algorithm. Overall, this thesis contributes novel distributed algorithms for HC-s-t path enumeration, batch HC-s-t path query processing, and graph structural clustering. The proposed algorithms address the efficiency and scalability challenges in graph analysis, particularly on billion-scale graphs. Extensive experimental evaluations validate the superiority of our algorithms compared to existing solutions, enabling efficient and scalable graph analysis in complex systems

UNSWorks

Efficient Graph Isomorphism Query Processing using Degree Sequences and Color-Label Distributions

Author: Galil Z
Gu G
Han WS
Italiano GF
Nam Y
Park K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Given a set of data graphs and a query graph, graph isomorphism query processing is the problem of finding all the data graphs that are isomorphic to the query graph. Graph isomorphism query processing is a core problem in graph analysis of various application domains. In existing approaches, index construction or query processing takes much time as the graph sizes increase. In this paper, we propose an efficient algorithm for graph isomorphism query processing. We introduce the color-label distribution which represents the canonical coloring of a vertex-labeled graph. Based on degree sequences and color-label distributions, we introduce a two-level index, which helps us efficiently solve graph isomorphism query processing. Experimental results on real datasets show that the proposed algorithm is orders of magnitude faster than the state-of-the-art algorithms in terms of index construction time, and it runs faster than existing algorithms in terms of query processing time as the graph sizes increase

포항공과대학교

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma