77 research outputs found
Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins
We study the problem of optimizing subgraph queries using the new worst-case
optimal join plans. Worst-case optimal plans evaluate queries by matching one
query vertex at a time using multiway intersections. The core problem in
optimizing worst-case optimal plans is to pick an ordering of the query
vertices to match. We design a cost-based optimizer that (i) picks efficient
query vertex orderings for worst-case optimal plans; and (ii) generates hybrid
plans that mix traditional binary joins with worst-case optimal style multiway
intersections. Our cost metric combines the cost of binary joins with a new
cost metric called intersection-cost. The plan space of our optimizer contains
plans that are not in the plan spaces based on tree decompositions from prior
work. In addition to our optimizer, we describe an adaptive technique that
changes the orderings of the worst-case optimal sub-plans during query
execution. We demonstrate the effectiveness of the plans our optimizer picks
and adaptive technique through extensive experiments. Our optimizer is
integrated into the Graphflow DBMS
SubGraph2Vec: Highly-Vectorized Tree-likeSubgraph Counting
Subgraph counting aims to count occurrences of a template T in a given
network G(V, E). It is a powerful graph analysis tool and has found real-world
applications in diverse domains. Scaling subgraph counting problems is known to
be memory bounded and computationally challenging with exponential complexity.
Although scalable parallel algorithms are known for several graph problems such
as Triangle Counting and PageRank, this is not common for counting complex
subgraphs. Here we address this challenge and study connected acyclic graphs or
trees. We propose a novel vectorized subgraph counting algorithm, named
Subgraph2Vec, as well as both shared memory and distributed implementations: 1)
reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving
a highly-vectorized implementation upon linear algebra kernels to significantly
improve performance and hardware utilization. 3) Subgraph2Vec improves the
overall performance over the state-of-the-art work by orders of magnitude and
up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up
the template size to 20 and maintain good strong scalability. 5) enabling
portability to both CPU and GPU.Comment: arXiv admin note: text overlap with arXiv:1903.0439
l2Match: Optimization Techniques on Subgraph Matching Algorithm using Label Pair, Neighboring Label Index, and Jump-Redo method
Graph database is designed to store bidirectional relationships between
objects and facilitate the traversal process to extract a subgraph. However,
the subgraph matching process is an NP-Complete problem. Existing solutions to
this problem usually employ a filter-and-verification framework and a
divide-and-conquer method. The filter-and-verification framework minimizes the
number of inputs to the verification stage by filtering and pruning invalid
candidates as much as possible. Meanwhile, subgraph matching is performed on
the substructure decomposed from the larger graph to yield partial embedding.
Subsequently, the recursive traversal or set intersection technique combines
the partial embedding into a complete subgraph. In this paper, we first present
a comprehensive literature review of the state-of-the-art solutions. l2Match, a
subgraph isomorphism algorithm for small queries utilizing a Label-Pair Index
and filtering method, is then proposed and presented as a proof of concept.
Empirical experimentation shows that l2Match outperforms related
state-of-the-art solutions, and the proposed methods optimize the existing
algorithms.Comment: This short version of this article (6 pages) is accepted by ICEIC
202
Efficient access methods for very large distributed graph databases
Subgraph searching is an essential problem in graph databases, but it is also challenging due to the involved subgraph isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using an index to prune out graphs that do not fit the query in a filtering stage, reducing the number of subgraph isomorphism evaluations in a subsequent verification stage. Subgraph searching has to be applied to very large databases (tens of millions of graphs) in real applications such as molecular substructure searching. Previous surveys have identified the FTV solutions GraphGrepSX (GGSX) and CT-Index as the best ones for large databases (thousands of graphs), however they cannot reach reasonable performance on very large ones (tens of millions graphs). This paper proposes a generic approach for the distributed implementation of FTV solutions. Besides, three previous methods that improve the performance of GGSX and CT-Index are adapted to be executed in clusters. The evaluation shows how the achieved solutions provide a great performance improvement (between 70% and 90% of filtering time reduction) in a centralized configuration and how they may be used to achieve efficient subgraph searching over very large databases in cluster configurationsThis work has been co-funded by the Ministerio de Economía y Competitividad of the Spanish government, and by Mestrelab Research S.L. through the project NEXTCHROM (RTC-2015-3812-2) of the call Retos-Colaboración of the program Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad. The authors wish to thank the financial support provided by Xunta de Galicia under the Project ED431B 2018/28S
Symmetric continuous subgraph matching with bidirectional dynamic programming
6sìopenIn many real datasets such as social media streams and cyber data sources, graphs change over time through a graph update stream of edge insertions and deletions. Detecting critical patterns in such dynamic graphs plays an important role in various application domains such as fraud detection, cyber security, and recommendation systems for social networks. Given a dynamic data graph and a query graph, the continuous subgraph matching problem is to find all positive matches for each edge insertion and all negative matches for each edge deletion. The state-of-the-art algorithm TurboFlux uses a spanning tree of a query graph for filtering. However, using the spanning tree may have a low pruning power because it does not take into account all edges of the query graph. In this paper, we present a symmetric and much faster algorithm SymBi which maintains an auxiliary data structure based on a directed acyclic graph instead of a spanning tree, which maintains the intermediate results of bidirectional dynamic programming between the query graph and the dynamic graph. Extensive experiments with real and synthetic datasets show that SymBi outperforms the state-of-the-art algorithm by up to three orders of magnitude in terms of the elapsed time.openAccademicoMin S.; Park S.G.; Park K.; Giammarresi D.; Italiano G.F.; Han W.-S.Min, S.; Park, S. G.; Park, K.; Giammarresi, D.; Italiano, G. F.; Han, W. -S
DPQP: A D-representation-based Pipelined Factorized Query Processor for Graph Database Management Systems
Factorized databases utilize factorized data representations during query processing to obtain more compact final query results and faster runtimes for queries with many-to-many joins. We revisit this technique in the context of graph database management systems (GDBMSs) whose common workloads are large joins with many-to-many relationships on graph-structured data. We first review the theory of factorized databases and classic flat intermediate tuple structure in traditional pipelined GDBMSs. We then present our tuple representation which mimics factorized representations and can be easily integrated into existing query processors. We further describe how to cache sub-query results with this factorized tuple structure through a static dependency analysis of the query. We have integrated our factorized query processor into GraphflowDB, an in-memory GDBMS. Compared to the original version of GraphflowDB, whose processor is not fully factorized, query plans in our processor can be orders of magnitude faster and produce orders of magnitude smaller result sizes
GraphMatch: Subgraph Query Processing on FPGAs
Efficiently finding subgraph embeddings in large graphs is crucial for many
application areas like biology and social network analysis. Set intersections
are the predominant and most challenging aspect of current join-based subgraph
query processing systems for CPUs. Previous work has shown the viability of
utilizing FPGAs for acceleration of graph and join processing.
In this work, we propose GraphMatch, the first genearl-purpose stand-alone
subgraph query processing accelerator based on worst-case optimal joins (WCOJ)
that is fully designed for modern, field programmable gate array (FPGA)
hardware. For efficient processing of various graph data sets and query graph
patterns, it leverages a novel set intersection approach, called AllCompare,
tailor-made for FPGAs. We show that this set intersection approach efficiently
solves multi-set intersections in subgraph query processing, superior to
CPU-based approaches. Overall, GraphMatch achieves a speedup of over 2.68x and
5.16x, compared to the state-of-the-art systems GraphFlow and RapidMatch,
respectively
- …