6 research outputs found
Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
Many studies have been conducted on seeking the efficient solution for
subgraph similarity search over certain (deterministic) graphs due to its wide
application in many fields, including bioinformatics, social network analysis,
and Resource Description Framework (RDF) data management. All these works
assume that the underlying data are certain. However, in reality, graphs are
often noisy and uncertain due to various factors, such as errors in data
extraction, inconsistencies in data integration, and privacy preserving
purposes. Therefore, in this paper, we study subgraph similarity search on
large probabilistic graph databases. Different from previous works assuming
that edges in an uncertain graph are independent of each other, we study the
uncertain graphs where edges' occurrences are correlated. We formally prove
that subgraph similarity search over probabilistic graphs is #P-complete, thus,
we employ a filter-and-verify framework to speed up the search. In the
filtering phase,we develop tight lower and upper bounds of subgraph similarity
probability based on a probabilistic matrix index, PMI. PMI is composed of
discriminative subgraph features associated with tight lower and upper bounds
of subgraph isomorphism probability. Based on PMI, we can sort out a large
number of probabilistic graphs and maximize the pruning capability. During the
verification phase, we develop an efficient sampling algorithm to validate the
remaining candidates. The efficiency of our proposed solutions has been
verified through extensive experiments.Comment: VLDB201
Efficient Subgraph Matching on Billion Node Graphs
The ability to handle large scale graph data is crucial to an increasing
number of applications. Much work has been dedicated to supporting basic graph
operations such as subgraph matching, reachability, regular expression
matching, etc. In many cases, graph indices are employed to speed up query
processing. Typically, most indices require either super-linear indexing time
or super-linear indexing space. Unfortunately, for very large graphs,
super-linear approaches are almost always infeasible. In this paper, we study
the problem of subgraph matching on billion-node graphs. We present a novel
algorithm that supports efficient subgraph matching for graphs deployed on a
distributed memory store. Instead of relying on super-linear indices, we use
efficient graph exploration and massive parallel computing for query
processing. Our experimental results demonstrate the feasibility of performing
subgraph matching on web-scale graph data.Comment: VLDB201
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
There is a growing need for methods which can capture uncertainties and
answer queries over graph-structured data. Two common types of uncertainty are
uncertainty over the attribute values of nodes and uncertainty over the
existence of edges. In this paper, we combine those with identity uncertainty.
Identity uncertainty represents uncertainty over the mapping from objects
mentioned in the data, or references, to the underlying real-world entities. We
propose the notion of a probabilistic entity graph (PEG), a probabilistic graph
model that defines a distribution over possible graphs at the entity level. The
model takes into account node attribute uncertainty, edge existence
uncertainty, and identity uncertainty, and thus enables us to systematically
reason about all three types of uncertainties in a uniform manner. We introduce
a general framework for constructing a PEG given uncertain data at the
reference level and develop highly efficient algorithms to answer subgraph
pattern matching queries in this setting. Our algorithms are based on two novel
ideas: context-aware path indexing and reduction by join-candidates, which
drastically reduce the query search space. A comprehensive experimental
evaluation shows that our approach outperforms baseline implementations by
orders of magnitude