2 research outputs found

    Efficient Path Enumeration and Structural Clustering on Massive Graphs

    Full text link
    Graph analysis plays a crucial role in understanding the relationships and structures within complex systems. This thesis focuses on addressing fundamental problems in graph analysis, including hop-constrained s-t simple path (HC-s-t path) enumeration, batch HC-s-t path query processing, and graph structural clustering (SCAN). The objective is to develop efficient and scalable distributed algorithms to tackle these challenges, particularly in the context of billion-scale graphs. We first explore the problem of HC-s-t path enumeration. Existing solutions for this problem often suffer from inefficiency and scalability limitations, especially when dealing with billion-scale graphs. To overcome these drawbacks, we propose a novel hybrid search paradigm specifically tailored for HC-s-t path enumeration. This paradigm combines different search strategies to effectively explore the solution space. Building upon this paradigm, we devise a distributed enumeration algorithm that follows a divide-and-conquer strategy, incorporates fruitless exploration pruning, and optimizes memory consumption. Experimental evaluations on various datasets demonstrate that our algorithm achieves a significant speedup compared to existing solutions, even on datasets where they encounter out-of-memory issues. Secondly, we address the problem of batch HC-s-t path query processing. In real-world scenarios, it is common to issue multiple HC-s-t path queries simultaneously and process them as a batch. However, existing solutions often focus on optimizing the processing performance of individual queries, disregarding the benefits of processing queries concurrently. To bridge this gap, we propose the concept of HC-s path queries, which captures the common computation among different queries. We design a two-phase HC-s path query detection algorithm to identify the shared computation for a given set of HC-s-t path queries. Based on the detected HC-s path queries, we develop an efficient HC-s-t path enumeration algorithm that effectively shares the common computation. Extensive experiments on diverse datasets validate the efficiency and scalability of our algorithm for processing multiple HC-s-t path queries concurrently. Thirdly, we investigate the problem of graph structural clustering (SCAN) in billion-scale graphs. Existing distributed solutions for SCAN often lack efficiency or suffer from high memory consumption, making them impractical for large-scale graphs. To overcome these challenges, we propose a fine-grained clustering framework specifically tailored for SCAN. This framework enables effective identification of cohesive subgroups within a graph. Building upon this framework, we devise a distributed SCAN algorithm that minimizes communication overhead and reduces memory consumption throughout the execution. We also incorporate an effective workload balance mechanism that dynamically adjusts to handle skewed workloads. Experimental evaluations on real-world graphs demonstrate the efficiency and scalability of our proposed algorithm. Overall, this thesis contributes novel distributed algorithms for HC-s-t path enumeration, batch HC-s-t path query processing, and graph structural clustering. The proposed algorithms address the efficiency and scalability challenges in graph analysis, particularly on billion-scale graphs. Extensive experimental evaluations validate the superiority of our algorithms compared to existing solutions, enabling efficient and scalable graph analysis in complex systems
    corecore