1,959 research outputs found
Hybrid algorithms for subgraph pattern queries in graph databases
Numerous methods have been proposed over the years for subgraph query processing, as it is central to graph analytics. Existing work is fragmented into two major categories. Methods in the filter-then-verify (FTV) category first construct an index of the DB graphs. Given a query, the index is used to filter out graphs that cannot contain the query. On the remaining graphs, a subgraph isomorphism algorithm is applied to verify whether each graph indeed contains the query. A second category of algorithms is mainly concerned with optimizing the Subgraph Isomorphism (SI) testing process (an NP-Complete problem) in order to find all occurrences of the query within each DB graph, also known as the matching problem. The current research trend is to totally dismiss FTV methods, because SI methods have been shown to enjoy much shorter query execution times and because of the alleged high costs of managing the DB graph index in FTV methods. Thus, a number of new SI methods are being proposed annually. In the current work, we initially study the performance of the latest SI algorithms over datasets consisting of a large number of graphs. With our study, we evaluate the algorithms’ performance and we provide comparison details with former studies. As a second step, we combine the powerful filtering of a top-performing FTV method, with the various SI methods, which leads to the best practice conclusion that SI and FTV shouldn’t be thought of as disjoint types of solutions, as their union achieves better results than any one of them individually. Specifically, we experimentally analyze and quantify the (positive) impact of including the essence of indexed FTV methods within SI methods, showing that query processing times can be significantly improved at modest additional memory costs. We show that these results hold over a variety of well-known SI methods and across several real and synthetic datasets. As such, hybrids of the type reveal a missing opportunity and a blind spot in related literature and trends
Using materialized views for answering graph pattern queries
Discovering patterns in graphs by evaluating graph pattern queries involving direct (edge-to-edge mapping) and reachability (edge-to-path mapping) relationships under homomorphisms on data graphs has been extensively studied. Previous studies have aimed to reduce the evaluation time of graph pattern queries due to the potentially numerous matches on large data graphs.
In this work, the concept of the summary graph is developed to improve the evaluation of tree pattern queries and graph pattern queries. The summary graph first filters out candidate matches which violate certain reachability constraints, and then finds local matches of query edges. This reduces redundancy in the representation of the query results and allows for computation sharing during the generation of these results. Methods using materialized graph pattern views are developed to improve the efficiency of graph pattern query evaluation. A view is materialized as a summary graph, which compactly records all the homomorphisms of the view to the data graph. View usability is characterized in terms of query edge coverage to provide necessary and sufficient conditions for answering queries using views, and algorithms are developed for determining view usability and for summary graph construction.
Experimental evaluation shows that the methods using summary graphs and its related concepts outperform previous state-of-the-art approaches. It also demonstrates that the view materialization method outperforms, by several orders of magnitude, a state-of-the-art approach which does not use materialized views, and substantially improves upon its scalability
Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins
We study the problem of optimizing subgraph queries using the new worst-case
optimal join plans. Worst-case optimal plans evaluate queries by matching one
query vertex at a time using multiway intersections. The core problem in
optimizing worst-case optimal plans is to pick an ordering of the query
vertices to match. We design a cost-based optimizer that (i) picks efficient
query vertex orderings for worst-case optimal plans; and (ii) generates hybrid
plans that mix traditional binary joins with worst-case optimal style multiway
intersections. Our cost metric combines the cost of binary joins with a new
cost metric called intersection-cost. The plan space of our optimizer contains
plans that are not in the plan spaces based on tree decompositions from prior
work. In addition to our optimizer, we describe an adaptive technique that
changes the orderings of the worst-case optimal sub-plans during query
execution. We demonstrate the effectiveness of the plans our optimizer picks
and adaptive technique through extensive experiments. Our optimizer is
integrated into the Graphflow DBMS
Joining Entities Across Relation and Graph with a Unified Model
This paper introduces RG (Relational Genetic) model, a revised relational
model to represent graph-structured data in RDBMS while preserving its
topology, for efficiently and effectively extracting data in different formats
from disparate sources. Along with: (a) SQL, an SQL dialect augmented
with graph pattern queries and tuple-vertex joins, such that one can extract
graph properties via graph pattern matching, and "semantically" match entities
across relations and graphs; (b) a logical representation of graphs in RDBMS,
which introduces an exploration operator for efficient pattern querying,
supports also browsing and updating graph-structured data; and (c) a strategy
to uniformly evaluate SQL, pattern and hybrid queries that join tuples and
vertices, all inside an RDBMS by leveraging its optimizer without performance
degradation on switching different execution engines. A lightweight system,
WhiteDB, is developed as an implementation to evaluate the benefits it can
actually bring on real-life data. We empirically verified that the RG model
enables the graph pattern queries to be answered as efficiently as in native
graph engines; can consider the access on graph and relation in any order for
optimal plan; and supports effective data enrichment.Comment: 24 pages, 16 figures, 5 table
- …