2 research outputs found

    iTurboGraph: Scaling and Automating Incremental Graph Analytics

    No full text
    With the rise of streaming data for dynamic graphs, large-scale graph analytics meets a new requirement of Incremental Computation because the larger the graph, the higher the cost for updating the analytics results by re-execution. A dynamic graph consists of an initial graph G and graph mutation updates ∆G of edge insertions or deletions. Given a query Q, its results Q(G), and updates for ∆G to G, incremental graph analytics computes updates ∆Q such that Q(G ∪ ∆G) = Q(G) ∪ ∆Q where ∪ is a union operator. In this paper, we consider the problem of large-scale incremental neighbor-centric graph analytics (NGA). We solve the limitations of previous systems: lack of usability due to the difficulties in programming incremental algorithms for NGA and limited scalability and efficiency due to the overheads in maintaining intermediate results for graph traversals in NGA. First, we propose a domainspecific language, LN GA, and develop its compiler for intuitive programming of NGA, automatic query incrementalization, and query optimizations. Second, we define Graph Streaming Algebra as a theoretical foundation for scalable processing of incremental NGA. We introduce a concept of Nested Graph Windows and model graph traversals as the generation of walk streams. Lastly, we present a system iTurboGraph, which efficiently processes incremental NGA for large graphs. Comprehensive experiments show that it effectively avoids costly re-executions and efficiently updates the analytics results with reduced IO and computations.1

    G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching

    No full text
    Despite the crucial role of cardinality estimation in query optimization, there has been no systematic and in-depth study of the existing cardinality estimation techniques for subgraph matching queries. In this paper, for the first time, we present a comprehensive study of the existing cardinality estimation techniques for subgraph matching queries, scaling far beyond the original experiments. We first introduce a novel framework called g-care that enables us to realize all existing techniques on top of it and that provides insights on their performance. By using g-care, we then reimplement representative cardinality estimation techniques for graph databases as well as relational databases. We next evaluate these techniques w.r.t accuracy on rdf and non-rdf graphs from different domains with subgraph matching queries of various topologies so far considered. Surprisingly, our results reveal that all existing techniques have serious problems in accuracy for various scenarios and datasets. Intriguingly, a simple sampling method based on an online aggregation technique designed for relational data, consistently outperforms all existing techniques.1
    corecore