96,201 research outputs found

    Graph pattern matching on social network analysis

    Get PDF
    Graph pattern matching is fundamental to social network analysis. Its effectiveness for identifying social communities and social positions, making recommendations and so on has been repeatedly demonstrated. However, the social network analysis raises new challenges to graph pattern matching. As real-life social graphs are typically large, it is often prohibitively expensive to conduct graph pattern matching over such large graphs, e.g., NP-complete for subgraph isomorphism, cubic time for bounded simulation, and quadratic time for simulation. These hinder the applicability of graph pattern matching on social network analysis. In response to these challenges, the thesis presents a series of effective techniques for querying large, dynamic, and distributively stored social networks. First of all, we propose a notion of query preserving graph compression, to compress large social graphs relative to a class Q of queries. We then develop both batch and incremental compression strategies for two commonly used pattern queries. Via both theoretical analysis and experimental studies, we show that (1) using compressed graphs Gr benefits graph pattern matching dramatically; and (2) the computation of Gr as well as its maintenance can be processed efficiently. Secondly, we investigate the distributed graph pattern matching problem, and explore parallel computation for graph pattern matching. We show that our techniques possess following performance guarantees: (1) each site is visited only once; (2) the total network traffic is independent of the size of G; and (3) the response time is decided by the size of largest fragment of G rather than the size of entire G. Furthermore, we show how these distributed algorithms can be implemented in the MapReduce framework. Thirdly, we study the problem of answering graph pattern matching using views since view based techniques have proven an effective technique for speeding up query evaluation. We propose a notion of pattern containment to characterise graph pattern matching using views, and introduce efficient algorithms to answer graph pattern matching using views. Moreover, we identify three problems related to graph pattern containment, and provide efficient algorithms for containment checking (approximation when the problem is intractable). Fourthly, we revise graph pattern matching by supporting a designated output node, which we treat as ā€œquery focusā€. We then introduce algorithms for computing the top-k relevant matches w.r.t. the output node for both acyclic and cyclic pattern graphs, respectively, with early termination property. Furthermore, we investigate the diversified top-k matching problem, and develop an approximation algorithm with performance guarantee and a heuristic algorithm with early termination property. Finally, we introduce an expert search system, called ExpFinder, for large and dynamic social networks. ExpFinder identifies top-k experts in social networks by graph pattern matching, and copes with the sheer size of real-life social networks by integrating incremental graph pattern matching, query preserving compression and top-k matching computation. In particular, we also introduce bounded (resp. unbounded) incremental algorithms to maintain the weighted landmark vectors which are used for incremental maintenance for cached results

    Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

    Full text link
    Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.Comment: To appear in WWW 201

    Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data

    Full text link
    As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In this paper, (1) we propose generalized tree pattern queries (GTPQs) over graph-structured data, which fully support propositional logic of structural constraints. (2) We make a thorough study of fundamental problems including satisfiability, containment and minimization, and analyze the computational complexity and the decision procedures of these problems. (3) We propose a compact graph representation of intermediate results and a pruning approach to reduce the size of intermediate results and the number of join operations -- two factors that often impair the efficiency of traditional algorithms for evaluating tree pattern queries. (4) We present an efficient algorithm for evaluating GTPQs using 3-hop as the underlying reachability index. (5) Experiments on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our algorithm, from several times to orders of magnitude faster than state-of-the-art algorithms in terms of evaluation time, even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page

    Investigative Simulation: Towards Utilizing Graph Pattern Matching for Investigative Search

    Full text link
    This paper proposes the use of graph pattern matching for investigative graph search, which is the process of searching for and prioritizing persons of interest who may exhibit part or all of a pattern of suspicious behaviors or connections. While there are a variety of applications, our principal motivation is to aid law enforcement in the detection of homegrown violent extremists. We introduce investigative simulation, which consists of several necessary extensions to the existing dual simulation graph pattern matching scheme in order to make it appropriate for intelligence analysts and law enforcement officials. Specifically, we impose a categorical label structure on nodes consistent with the nature of indicators in investigations, as well as prune or complete search results to ensure sensibility and usefulness of partial matches to analysts. Lastly, we introduce a natural top-k ranking scheme that can help analysts prioritize investigative efforts. We demonstrate performance of investigative simulation on a real-world large dataset.Comment: 8 pages, 6 figures. Paper to appear in the Fosint-SI 2016 conference proceedings in conjunction with the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ASONAM 201
    • ā€¦