96,201 research outputs found
Graph pattern matching on social network analysis
Graph pattern matching is fundamental to social network analysis. Its effectiveness
for identifying social communities and social positions, making recommendations and
so on has been repeatedly demonstrated. However, the social network analysis raises
new challenges to graph pattern matching. As real-life social graphs are typically
large, it is often prohibitively expensive to conduct graph pattern matching over such
large graphs, e.g., NP-complete for subgraph isomorphism, cubic time for bounded
simulation, and quadratic time for simulation. These hinder the applicability of graph
pattern matching on social network analysis. In response to these challenges, the thesis
presents a series of effective techniques for querying large, dynamic, and distributively
stored social networks.
First of all, we propose a notion of query preserving graph compression, to compress
large social graphs relative to a class Q of queries. We then develop both batch
and incremental compression strategies for two commonly used pattern queries. Via
both theoretical analysis and experimental studies, we show that (1) using compressed
graphs Gr benefits graph pattern matching dramatically; and (2) the computation of Gr
as well as its maintenance can be processed efficiently.
Secondly, we investigate the distributed graph pattern matching problem, and explore
parallel computation for graph pattern matching. We show that our techniques
possess following performance guarantees: (1) each site is visited only once; (2) the total
network traffic is independent of the size of G; and (3) the response time is decided
by the size of largest fragment of G rather than the size of entire G. Furthermore, we
show how these distributed algorithms can be implemented in the MapReduce framework.
Thirdly, we study the problem of answering graph pattern matching using views
since view based techniques have proven an effective technique for speeding up query
evaluation. We propose a notion of pattern containment to characterise graph pattern
matching using views, and introduce efficient algorithms to answer graph pattern
matching using views. Moreover, we identify three problems related to graph pattern
containment, and provide efficient algorithms for containment checking (approximation
when the problem is intractable).
Fourthly, we revise graph pattern matching by supporting a designated output node,
which we treat as āquery focusā. We then introduce algorithms for computing the top-k
relevant matches w.r.t. the output node for both acyclic and cyclic pattern graphs, respectively,
with early termination property. Furthermore, we investigate the diversified
top-k matching problem, and develop an approximation algorithm with performance
guarantee and a heuristic algorithm with early termination property.
Finally, we introduce an expert search system, called ExpFinder, for large and dynamic
social networks. ExpFinder identifies top-k experts in social networks by graph
pattern matching, and copes with the sheer size of real-life social networks by integrating
incremental graph pattern matching, query preserving compression and top-k
matching computation. In particular, we also introduce bounded (resp. unbounded)
incremental algorithms to maintain the weighted landmark vectors which are used for
incremental maintenance for cached results
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Many problems in areas as diverse as recommendation systems, social network
analysis, semantic search, and distributed root cause analysis can be modeled
as pattern search on labeled graphs (also called "heterogeneous information
networks" or HINs). Given a large graph and a query pattern with node and edge
label constraints, a fundamental challenge is to nd the top-k matches ac-
cording to a ranking function over edge and node weights. For users, it is di
cult to select value k . We therefore propose the novel notion of an any-k
ranking algorithm: for a given time budget, re- turn as many of the top-ranked
results as possible. Then, given additional time, produce the next lower-ranked
results quickly as well. It can be stopped anytime, but may have to continues
until all results are returned. This paper focuses on acyclic patterns over
arbitrary labeled graphs. We are interested in practical algorithms that
effectively exploit (1) properties of heterogeneous networks, in particular
selective constraints on labels, and (2) that the users often explore only a
fraction of the top-ranked results. Our solution, KARPET, carefully integrates
aggressive pruning that leverages the acyclic nature of the query, and
incremental guided search. It enables us to prove strong non-trivial time and
space guarantees, which is generally considered very hard for this type of
graph search problem. Through experimental studies we show that KARPET achieves
running times in the order of milliseconds for tree patterns on large networks
with millions of nodes and edges.Comment: To appear in WWW 201
Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data
As data are increasingly modeled as graphs for expressing complex
relationships, the tree pattern query on graph-structured data becomes an
important type of queries in real-world applications. Most practical query
languages, such as XQuery and SPARQL, support logical expressions using
logical-AND/OR/NOT operators to define structural constraints of tree patterns.
In this paper, (1) we propose generalized tree pattern queries (GTPQs) over
graph-structured data, which fully support propositional logic of structural
constraints. (2) We make a thorough study of fundamental problems including
satisfiability, containment and minimization, and analyze the computational
complexity and the decision procedures of these problems. (3) We propose a
compact graph representation of intermediate results and a pruning approach to
reduce the size of intermediate results and the number of join operations --
two factors that often impair the efficiency of traditional algorithms for
evaluating tree pattern queries. (4) We present an efficient algorithm for
evaluating GTPQs using 3-hop as the underlying reachability index. (5)
Experiments on both real-life and synthetic data sets demonstrate the
effectiveness and efficiency of our algorithm, from several times to orders of
magnitude faster than state-of-the-art algorithms in terms of evaluation time,
even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page
Investigative Simulation: Towards Utilizing Graph Pattern Matching for Investigative Search
This paper proposes the use of graph pattern matching for investigative graph
search, which is the process of searching for and prioritizing persons of
interest who may exhibit part or all of a pattern of suspicious behaviors or
connections. While there are a variety of applications, our principal
motivation is to aid law enforcement in the detection of homegrown violent
extremists. We introduce investigative simulation, which consists of several
necessary extensions to the existing dual simulation graph pattern matching
scheme in order to make it appropriate for intelligence analysts and law
enforcement officials. Specifically, we impose a categorical label structure on
nodes consistent with the nature of indicators in investigations, as well as
prune or complete search results to ensure sensibility and usefulness of
partial matches to analysts. Lastly, we introduce a natural top-k ranking
scheme that can help analysts prioritize investigative efforts. We demonstrate
performance of investigative simulation on a real-world large dataset.Comment: 8 pages, 6 figures. Paper to appear in the Fosint-SI 2016 conference
proceedings in conjunction with the 2016 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining ASONAM 201
- ā¦