16,893 research outputs found
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
There is a growing need for methods which can capture uncertainties and
answer queries over graph-structured data. Two common types of uncertainty are
uncertainty over the attribute values of nodes and uncertainty over the
existence of edges. In this paper, we combine those with identity uncertainty.
Identity uncertainty represents uncertainty over the mapping from objects
mentioned in the data, or references, to the underlying real-world entities. We
propose the notion of a probabilistic entity graph (PEG), a probabilistic graph
model that defines a distribution over possible graphs at the entity level. The
model takes into account node attribute uncertainty, edge existence
uncertainty, and identity uncertainty, and thus enables us to systematically
reason about all three types of uncertainties in a uniform manner. We introduce
a general framework for constructing a PEG given uncertain data at the
reference level and develop highly efficient algorithms to answer subgraph
pattern matching queries in this setting. Our algorithms are based on two novel
ideas: context-aware path indexing and reduction by join-candidates, which
drastically reduce the query search space. A comprehensive experimental
evaluation shows that our approach outperforms baseline implementations by
orders of magnitude
Investigative Simulation: Towards Utilizing Graph Pattern Matching for Investigative Search
This paper proposes the use of graph pattern matching for investigative graph
search, which is the process of searching for and prioritizing persons of
interest who may exhibit part or all of a pattern of suspicious behaviors or
connections. While there are a variety of applications, our principal
motivation is to aid law enforcement in the detection of homegrown violent
extremists. We introduce investigative simulation, which consists of several
necessary extensions to the existing dual simulation graph pattern matching
scheme in order to make it appropriate for intelligence analysts and law
enforcement officials. Specifically, we impose a categorical label structure on
nodes consistent with the nature of indicators in investigations, as well as
prune or complete search results to ensure sensibility and usefulness of
partial matches to analysts. Lastly, we introduce a natural top-k ranking
scheme that can help analysts prioritize investigative efforts. We demonstrate
performance of investigative simulation on a real-world large dataset.Comment: 8 pages, 6 figures. Paper to appear in the Fosint-SI 2016 conference
proceedings in conjunction with the 2016 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining ASONAM 201
Fast Search for Dynamic Multi-Relational Graphs
Acting on time-critical events by processing ever growing social media or
news streams is a major technical challenge. Many of these data sources can be
modeled as multi-relational graphs. Continuous queries or techniques to search
for rare events that typically arise in monitoring applications have been
studied extensively for relational databases. This work is dedicated to answer
the question that emerges naturally: how can we efficiently execute a
continuous query on a dynamic graph? This paper presents an exact subgraph
search algorithm that exploits the temporal characteristics of representative
queries for online news or social media monitoring. The algorithm is based on a
novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the
structural and semantic characteristics of the underlying multi-relational
graph. The paper concludes with extensive experimentation on several real-world
datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM),
201
Opportunistic linked data querying through approximate membership metadata
Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
Efficient Subgraph Matching on Billion Node Graphs
The ability to handle large scale graph data is crucial to an increasing
number of applications. Much work has been dedicated to supporting basic graph
operations such as subgraph matching, reachability, regular expression
matching, etc. In many cases, graph indices are employed to speed up query
processing. Typically, most indices require either super-linear indexing time
or super-linear indexing space. Unfortunately, for very large graphs,
super-linear approaches are almost always infeasible. In this paper, we study
the problem of subgraph matching on billion-node graphs. We present a novel
algorithm that supports efficient subgraph matching for graphs deployed on a
distributed memory store. Instead of relying on super-linear indices, we use
efficient graph exploration and massive parallel computing for query
processing. Our experimental results demonstrate the feasibility of performing
subgraph matching on web-scale graph data.Comment: VLDB201
- …