1,030 research outputs found
Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling
We propose a new exact method for shortest-path distance queries on
large-scale networks. Our method precomputes distance labels for vertices by
performing a breadth-first search from every vertex. Seemingly too obvious and
too inefficient at first glance, the key ingredient introduced here is pruning
during breadth-first searches. While we can still answer the correct distance
for any pair of vertices from the labels, it surprisingly reduces the search
space and sizes of labels. Moreover, we show that we can perform 32 or 64
breadth-first searches simultaneously exploiting bitwise operations. We
experimentally demonstrate that the combination of these two techniques is
efficient and robust on various kinds of large-scale real-world networks. In
particular, our method can handle social networks and web graphs with hundreds
of millions of edges, which are two orders of magnitude larger than the limits
of previous exact methods, with comparable query time to those of previous
methods.Comment: To appear in SIGMOD 201
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
There is a growing need for methods which can capture uncertainties and
answer queries over graph-structured data. Two common types of uncertainty are
uncertainty over the attribute values of nodes and uncertainty over the
existence of edges. In this paper, we combine those with identity uncertainty.
Identity uncertainty represents uncertainty over the mapping from objects
mentioned in the data, or references, to the underlying real-world entities. We
propose the notion of a probabilistic entity graph (PEG), a probabilistic graph
model that defines a distribution over possible graphs at the entity level. The
model takes into account node attribute uncertainty, edge existence
uncertainty, and identity uncertainty, and thus enables us to systematically
reason about all three types of uncertainties in a uniform manner. We introduce
a general framework for constructing a PEG given uncertain data at the
reference level and develop highly efficient algorithms to answer subgraph
pattern matching queries in this setting. Our algorithms are based on two novel
ideas: context-aware path indexing and reduction by join-candidates, which
drastically reduce the query search space. A comprehensive experimental
evaluation shows that our approach outperforms baseline implementations by
orders of magnitude
Subjectively interesting connecting trees and forests
Consider a large graph or network, and a user-provided set of query vertices between which the user wishes to explore relations. For example, a researcher may want to connect research papers in a citation network, an analyst may wish to connect organized crime suspects in a communication network, or an internet user may want to organize their bookmarks given their location in the world wide web. A natural way to do this is to connect the vertices in the form of a tree structure that is present in the graph. However, in sufficiently dense graphs, most such trees will be large or somehow trivial (e.g. involving high degree vertices) and thus not insightful. Extending previous research, we define and investigate the new problem of mining subjectively interesting trees connecting a set of query vertices in a graph, i.e., trees that are highly surprising to the specific user at hand. Using information theoretic principles, we formalize the notion of interestingness of such trees mathematically, taking in account certain prior beliefs the user has specified about the graph. A remaining problem is efficiently fitting a prior belief model. We show how this can be done for a large class of prior beliefs. Given a specified prior belief model, we then propose heuristic algorithms to find the best trees efficiently. An empirical validation of our methods on a large real graphs evaluates the different heuristics and validates the interestingness of the given trees
K-Reach: Who is in Your Small World
We study the problem of answering k-hop reachability queries in a directed
graph, i.e., whether there exists a directed path of length k, from a source
query vertex to a target query vertex in the input graph. The problem of k-hop
reachability is a general problem of the classic reachability (where
k=infinity). Existing indexes for processing classic reachability queries, as
well as for processing shortest path queries, are not applicable or not
efficient for processing k-hop reachability queries. We propose an index for
processing k-hop reachability queries, which is simple in design and efficient
to construct. Our experimental results on a wide range of real datasets show
that our index is more efficient than the state-of-the-art indexes even for
processing classic reachability queries, for which these indexes are primarily
designed. We also show that our index is efficient in answering k-hop
reachability queries.Comment: VLDB201
Efficient Snapshot Retrieval over Historical Graph Data
We address the problem of managing historical data for large evolving
information networks like social networks or citation networks, with the goal
to enable temporal and evolutionary queries and analysis. We present the design
and architecture of a distributed graph database system that stores the entire
history of a network and provides support for efficient retrieval of multiple
graphs from arbitrary time points in the past, in addition to maintaining the
current state for ongoing updates. Our system exposes a general programmatic
API to process and analyze the retrieved snapshots. We introduce DeltaGraph, a
novel, extensible, highly tunable, and distributed hierarchical index structure
that enables compactly recording the historical information, and that supports
efficient retrieval of historical graph snapshots for single-site or parallel
processing. Along with the original graph data, DeltaGraph can also maintain
and index auxiliary information; this functionality can be used to extend the
structure to efficiently execute queries like subgraph pattern matching over
historical data. We develop analytical models for both the storage space needed
and the snapshot retrieval times to aid in choosing the right parameters for a
specific scenario. In addition, we present strategies for materializing
portions of the historical graph state in memory to further speed up the
retrieval process. Secondly, we present an in-memory graph data structure
called GraphPool that can maintain hundreds of historical graph instances in
main memory in a non-redundant manner. We present a comprehensive experimental
evaluation that illustrates the effectiveness of our proposed techniques at
managing historical graph information
- …