1 research outputs found

    Efficiently computing graph similarity and graph connectivity

    Full text link
    In many real applications, entities and their relationships can often be modeled as a graph G = (V,E) such that entities of interest are represented by vertices in V and their relationships are represented by edges in E. With the proliferation of graph applications, such as physics, chemistry, biology, social networks, information networks, and road networks, significant research efforts have been devoted towards managing and analyzing graph data. Specifically, this thesis studies important problems related to graph analytics, i.e., computing graph similarity between graphs (graph edit distance computation), and computing graph connectivity in distributed computing (distributed computing connected components, biconnected components, 2-edge-connected components).Firstly, we study Graph Edit Distance (GED) which is an important similarity measure adopted in a similarity-based analysis between two graphs, and computing GED is a primitive operator in graph database analysis. We develop a unified framework that can be instantiated into either a best-first search approach AStar+ or a depth-first search approach DFS+. Besides, we design anchor-aware lower bound estimation techniques to compute tighter lower bounds for intermediate search states, which significantly reduce the search spaces of both AStar+ and DFS+. We also propose efficient techniques to compute the lower bounds. Last but not least, based on our unified framework, we contrast AStar+ with DFS+ regarding their time and space complexities, and recommend that AStar+ is better than DFS+ by having a much smaller search space.Secondly, we study distributed computing Connected Components (CCs) and distributed computing BiConnected Components (BCCs) of a graph. We propose a new paradigm based on graph decomposition to reduce the total communication costs from O(m * #supersteps) to O(m) for both computing CCs and computing BCCs, where m is the number of edges in a graph and #supersteps is the number of supersteps. Moreover, the total computation costs of our techniques are smaller than that of the existing techniques in practice, though theoretically they are almost the same. Thirdly, we study distributed computing 2-Edge-Connected Components (ECCs) of a graph. We are the first to study this problem and we propose a new paradigm based on graph decomposition to compute ECCs with O(m) total communication cost. We also extend our techniques to compute all articulation points and bridges in a graph
    corecore