9,629 research outputs found
Processing Regular Path Queries on Arbitrarily Distributed Data
Regular Path Queries (RPQs) are a type of graph query where answers are pairs
of nodes connected by a sequence of edges matching a regular expression. We
study the techniques to process such queries on a distributed graph of data.
While many techniques assume the location of each data element (node or edge)
is known, when the components of the distributed system are autonomous, the
data will be arbitrarily distributed. As the different query processing
strategies are equivalently costly in the worst case, we isolate
query-dependent cost factors and present a method to choose between strategies,
using new query cost estimation techniques. We evaluate our techniques using
meaningful queries on biomedical data
Performance Guarantees for Distributed Reachability Queries
In the real world a graph is often fragmented and distributed across
different sites. This highlights the need for evaluating queries on distributed
graphs. This paper proposes distributed evaluation algorithms for three classes
of queries: reachability for determining whether one node can reach another,
bounded reachability for deciding whether there exists a path of a bounded
length between a pair of nodes, and regular reachability for checking whether
there exists a path connecting two nodes such that the node labels on the path
form a string in a given regular expression. We develop these algorithms based
on partial evaluation, to explore parallel computation. When evaluating a query
Q on a distributed graph G, we show that these algorithms possess the following
performance guarantees, no matter how G is fragmented and distributed: (1) each
site is visited only once; (2) the total network traffic is determined by the
size of Q and the fragmentation of G, independent of the size of G; and (3) the
response time is decided by the largest fragment of G rather than the entire G.
In addition, we show that these algorithms can be readily implemented in the
MapReduce framework. Using synthetic and real-life data, we experimentally
verify that these algorithms are scalable on large graphs, regardless of how
the graphs are distributed.Comment: VLDB201
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
Reasoning about Independence in Probabilistic Models of Relational Data
We extend the theory of d-separation to cases in which data instances are not
independent and identically distributed. We show that applying the rules of
d-separation directly to the structure of probabilistic models of relational
data inaccurately infers conditional independence. We introduce relational
d-separation, a theory for deriving conditional independence facts from
relational models. We provide a new representation, the abstract ground graph,
that enables a sound, complete, and computationally efficient method for
answering d-separation queries about relational models, and we present
empirical results that demonstrate effectiveness.Comment: 61 pages, substantial revisions to formalisms, theory, and related
wor
Route Planning in Transportation Networks
We survey recent advances in algorithms for route planning in transportation
networks. For road networks, we show that one can compute driving directions in
milliseconds or less even at continental scale. A variety of techniques provide
different trade-offs between preprocessing effort, space requirements, and
query time. Some algorithms can answer queries in a fraction of a microsecond,
while others can deal efficiently with real-time traffic. Journey planning on
public transportation systems, although conceptually similar, is a
significantly harder problem due to its inherent time-dependent and
multicriteria nature. Although exact algorithms are fast enough for interactive
queries on metropolitan transit systems, dealing with continent-sized instances
requires simplifications or heavy preprocessing. The multimodal route planning
problem, which seeks journeys combining schedule-based transportation (buses,
trains) with unrestricted modes (walking, driving), is even harder, relying on
approximate solutions even for metropolitan inputs.Comment: This is an updated version of the technical report MSR-TR-2014-4,
previously published by Microsoft Research. This work was mostly done while
the authors Daniel Delling, Andrew Goldberg, and Renato F. Werneck were at
Microsoft Research Silicon Valle
- …