13 research outputs found

    A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

    Full text link
    Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving networks spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with prominent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphs in a continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a "Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named "Relative Selectivity" that is used to select between different query processing strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10-100x speedups over selectivity agnostic approaches.Comment: in 18th International Conference on Extending Database Technology (EDBT) (2015

    DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs

    Full text link
    Subgraph listing is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Modern graphs can usually be large-scale as well as highly dynamic, which challenges the efficiency of existing subgraph listing algorithms. Recent works have shown the benefits of partitioning and processing big graphs in a distributed system, however, there is only few work targets subgraph listing on dynamic graphs in a distributed environment. In this paper, we propose an efficient approach, called Distributed and Dynamic Subgraph Listing (DDSL), which can incrementally update the results instead of running from scratch. DDSL follows a general distributed join framework. In this framework, we use a Neighbor-Preserved storage for data graphs, which takes bounded extra space and supports dynamic updating. After that, we propose a comprehensive cost model to estimate the I/O cost of listing subgraphs. Then based on this cost model, we develop an algorithm to find the optimal join tree for a given pattern. To handle dynamic graphs, we propose an efficient left-deep join algorithm to incrementally update the join results. Extensive experiments are conducted on real-world datasets. The results show that DDSL outperforms existing methods in dealing with both static dynamic graphs in terms of the responding time

    Continuous pattern detection over billion-edge graph using distributed framework

    No full text
    Continuous pattern detection plays an important role in monitoring-related applications. The large size and dynamic update of graphs, along with the massive search space, pose huge challenges in developing an efficient continuous pattern detection system. In this paper, we leverage a distributed graph processing framework to approximately detect a given pattern over a large dynamic graph. We aim to improve the scalability and precision, and reduce the response time and message cost in the detection. We convert a given query pattern into a Single-Sink DAG (Directed Acyclic Graph), and propose an evaluation plan with message transitions on the DAG, which is shorten by SSD plan, to detect the pattern in a large dynamic graph. SSD plan can guide the data graph exploration via messages, and the messages will converge at data sink vertices, which then detect existences of the query pattern. We also conduct join operations over partial vertices during the graph exploration to improve the precision of pattern detection. In addition, we show that SSD plan can support the continuous query over dynamic graphs with slight extensions. We further design various sink vertex selection strategies and neighborhood based transition rule attachment to lower the evaluation cost. The experiments on billion-edge real-life graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method. ? 2014 IEEE.EICPCI-S(ISTP)
    corecore