Article thumbnail

Efficient query processing on graph databases

By James Sheung-Chak Cheng

Abstract

Graph is a powerful modeling tool for representing and understanding objects and their relationships. In recent years, we have observed a rapid increase in the volume of graph data. However, the performance of query processing on large graph databases is still inadequate. In this thesis, we develop efficient methods for processing queries on large graph databases. Our work mainly focuses on two types of most popular graph databases: transaction graph databases that consist of a large set of small graphs (e.g., chemistry and bio-informatics databases) and large graphs (e.g., the Web and social networks). For transaction graph databases, we develop a novel index based on the concept of frequent subgraphs. We devise a clustering technique to partition the set of frequent subgraphs (FGs). Then, we organize the clusters into a multi-level search tree, where each tree node is an inverted-index. We further employ features to reduce the cost of index-probing at each inverted-index. The most distinguished feature of our index is that no candidate verification is required for processing queries that are FGs, whereas candidate verification is known to be the most expensive step using other existing graph indexes. For processing queries that are not FGs, we model the user queries as a stream and propose the concept of frequently asked queries (FAQs) over a sliding window in the stream, so that non-FG queries that are FAQs can also be answered without candidate verification. When a query is not an FG and not frequently asked, we utilize the FAQs to obtain a subset of the answer set such that verification is only required for a small subset of candidates. Our extensive experiments show that our index is orders of magnitude more efficient than the state-of-the-art graph indexes. We also devise an efficient update algorithm for maintaining the index. For large graphs, we propose a novel partition-and-conquer query processing paradigm. First, we partition a large graph into a set of small communities based on the concept of modularity. In this way, we can efficiently compute the connection between the query nodes that are in the same community since a community is significantly smaller. Then, we extend the intra-community connection to the inter-community level by utilizing a community hierarchy tree. We control the quality of the intra-community connection by computing the maximum amount of information flow of the nodes/paths in the answer graph, while the quality of the inter-community connection is also ensured by the community hierarchy. Our experimental results show that our algorithm obtains the same high-quality results as the state-of-the-art algorithm, but the speed is three orders of magnitudes faster and the memory consumption is also significantly lower

Topics: Querying (Computer science), Graph theory -- Data processing
Year: 2008
OAI identifier: oai:repository.ust.hk:1783.1-3538
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://repository.ust.hk/ir/bi... (external link)
  • https://doi.org/10.14711/thesi... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.