77 research outputs found

    Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins

    Full text link
    We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We design a cost-based optimizer that (i) picks efficient query vertex orderings for worst-case optimal plans; and (ii) generates hybrid plans that mix traditional binary joins with worst-case optimal style multiway intersections. Our cost metric combines the cost of binary joins with a new cost metric called intersection-cost. The plan space of our optimizer contains plans that are not in the plan spaces based on tree decompositions from prior work. In addition to our optimizer, we describe an adaptive technique that changes the orderings of the worst-case optimal sub-plans during query execution. We demonstrate the effectiveness of the plans our optimizer picks and adaptive technique through extensive experiments. Our optimizer is integrated into the Graphflow DBMS

    SubGraph2Vec: Highly-Vectorized Tree-likeSubgraph Counting

    Full text link
    Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named Subgraph2Vec, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) Subgraph2Vec improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.Comment: arXiv admin note: text overlap with arXiv:1903.0439

    l2Match: Optimization Techniques on Subgraph Matching Algorithm using Label Pair, Neighboring Label Index, and Jump-Redo method

    Full text link
    Graph database is designed to store bidirectional relationships between objects and facilitate the traversal process to extract a subgraph. However, the subgraph matching process is an NP-Complete problem. Existing solutions to this problem usually employ a filter-and-verification framework and a divide-and-conquer method. The filter-and-verification framework minimizes the number of inputs to the verification stage by filtering and pruning invalid candidates as much as possible. Meanwhile, subgraph matching is performed on the substructure decomposed from the larger graph to yield partial embedding. Subsequently, the recursive traversal or set intersection technique combines the partial embedding into a complete subgraph. In this paper, we first present a comprehensive literature review of the state-of-the-art solutions. l2Match, a subgraph isomorphism algorithm for small queries utilizing a Label-Pair Index and filtering method, is then proposed and presented as a proof of concept. Empirical experimentation shows that l2Match outperforms related state-of-the-art solutions, and the proposed methods optimize the existing algorithms.Comment: This short version of this article (6 pages) is accepted by ICEIC 202

    Efficient access methods for very large distributed graph databases

    Get PDF
    Subgraph searching is an essential problem in graph databases, but it is also challenging due to the involved subgraph isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using an index to prune out graphs that do not fit the query in a filtering stage, reducing the number of subgraph isomorphism evaluations in a subsequent verification stage. Subgraph searching has to be applied to very large databases (tens of millions of graphs) in real applications such as molecular substructure searching. Previous surveys have identified the FTV solutions GraphGrepSX (GGSX) and CT-Index as the best ones for large databases (thousands of graphs), however they cannot reach reasonable performance on very large ones (tens of millions graphs). This paper proposes a generic approach for the distributed implementation of FTV solutions. Besides, three previous methods that improve the performance of GGSX and CT-Index are adapted to be executed in clusters. The evaluation shows how the achieved solutions provide a great performance improvement (between 70% and 90% of filtering time reduction) in a centralized configuration and how they may be used to achieve efficient subgraph searching over very large databases in cluster configurationsThis work has been co-funded by the Ministerio de Economía y Competitividad of the Spanish government, and by Mestrelab Research S.L. through the project NEXTCHROM (RTC-2015-3812-2) of the call Retos-Colaboración of the program Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad. The authors wish to thank the financial support provided by Xunta de Galicia under the Project ED431B 2018/28S

    Symmetric continuous subgraph matching with bidirectional dynamic programming

    Get PDF
    6sìopenIn many real datasets such as social media streams and cyber data sources, graphs change over time through a graph update stream of edge insertions and deletions. Detecting critical patterns in such dynamic graphs plays an important role in various application domains such as fraud detection, cyber security, and recommendation systems for social networks. Given a dynamic data graph and a query graph, the continuous subgraph matching problem is to find all positive matches for each edge insertion and all negative matches for each edge deletion. The state-of-the-art algorithm TurboFlux uses a spanning tree of a query graph for filtering. However, using the spanning tree may have a low pruning power because it does not take into account all edges of the query graph. In this paper, we present a symmetric and much faster algorithm SymBi which maintains an auxiliary data structure based on a directed acyclic graph instead of a spanning tree, which maintains the intermediate results of bidirectional dynamic programming between the query graph and the dynamic graph. Extensive experiments with real and synthetic datasets show that SymBi outperforms the state-of-the-art algorithm by up to three orders of magnitude in terms of the elapsed time.openAccademicoMin S.; Park S.G.; Park K.; Giammarresi D.; Italiano G.F.; Han W.-S.Min, S.; Park, S. G.; Park, K.; Giammarresi, D.; Italiano, G. F.; Han, W. -S

    DPQP: A D-representation-based Pipelined Factorized Query Processor for Graph Database Management Systems

    Get PDF
    Factorized databases utilize factorized data representations during query processing to obtain more compact final query results and faster runtimes for queries with many-to-many joins. We revisit this technique in the context of graph database management systems (GDBMSs) whose common workloads are large joins with many-to-many relationships on graph-structured data. We first review the theory of factorized databases and classic flat intermediate tuple structure in traditional pipelined GDBMSs. We then present our tuple representation which mimics factorized representations and can be easily integrated into existing query processors. We further describe how to cache sub-query results with this factorized tuple structure through a static dependency analysis of the query. We have integrated our factorized query processor into GraphflowDB, an in-memory GDBMS. Compared to the original version of GraphflowDB, whose processor is not fully factorized, query plans in our processor can be orders of magnitude faster and produce orders of magnitude smaller result sizes

    GraphMatch: Subgraph Query Processing on FPGAs

    Full text link
    Efficiently finding subgraph embeddings in large graphs is crucial for many application areas like biology and social network analysis. Set intersections are the predominant and most challenging aspect of current join-based subgraph query processing systems for CPUs. Previous work has shown the viability of utilizing FPGAs for acceleration of graph and join processing. In this work, we propose GraphMatch, the first genearl-purpose stand-alone subgraph query processing accelerator based on worst-case optimal joins (WCOJ) that is fully designed for modern, field programmable gate array (FPGA) hardware. For efficient processing of various graph data sets and query graph patterns, it leverages a novel set intersection approach, called AllCompare, tailor-made for FPGAs. We show that this set intersection approach efficiently solves multi-set intersections in subgraph query processing, superior to CPU-based approaches. Overall, GraphMatch achieves a speedup of over 2.68x and 5.16x, compared to the state-of-the-art systems GraphFlow and RapidMatch, respectively
    corecore