6,251 research outputs found
Evaluation and Enumeration Problems for Regular Path Queries
Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths
Joining Extractions of Regular Expressions
Regular expressions with capture variables, also known as "regex formulas,"
extract relations of spans (interval positions) from text. These relations can
be further manipulated via Relational Algebra as studied in the context of
document spanners, Fagin et al.'s formal framework for information extraction.
We investigate the complexity of querying text by Conjunctive Queries (CQs) and
Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds
(NP-completeness and W[1]-hardness) from the relational world also hold in our
setting; in particular, hardness hits already single-character text! Yet, the
upper bounds from the relational world do not carry over. Unlike the relational
world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source
of hardness is that it may be intractable to instantiate the relation defined
by a regex formula, simply because it has an exponential number of tuples. Yet,
we are able to establish general upper bounds. In particular, UCQs can be
evaluated with polynomial delay, provided that every CQ has a bounded number of
atoms (while unions and projection can be arbitrary). Furthermore, UCQ
evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the
parameter is the size of the UCQ
Shortest Distances as Enumeration Problem
We investigate the single source shortest distance (SSSD) and all pairs
shortest distance (APSD) problems as enumeration problems (on unweighted and
integer weighted graphs), meaning that the elements -- where
and are vertices with shortest distance -- are produced and
listed one by one without repetition. The performance is measured in the RAM
model of computation with respect to preprocessing time and delay, i.e., the
maximum time that elapses between two consecutive outputs. This point of view
reveals that specific types of output (e.g., excluding the non-reachable pairs
, or excluding the self-distances ) and the order of
enumeration (e.g., sorted by distance, sorted row-wise with respect to the
distance matrix) have a huge impact on the complexity of APSD while they appear
to have no effect on SSSD.
In particular, we show for APSD that enumeration without output restrictions
is possible with delay in the order of the average degree. Excluding
non-reachable pairs, or requesting the output to be sorted by distance,
increases this delay to the order of the maximum degree. Further, for weighted
graphs, a delay in the order of the average degree is also not possible without
preprocessing or considering self-distances as output. In contrast, for SSSD we
find that a delay in the order of the maximum degree without preprocessing is
attainable and unavoidable for any of these requirements.Comment: Updated version adds the study of space complexit
A Trichotomy for Regular Trail Queries
Regular path queries (RPQs) are an essential component of graph query languages. Such queries consider a regular expression r and a directed edge-labeled graph G and search for paths in G for which the sequence of labels is in the language of r. In order to avoid having to consider infinitely many paths, some database engines restrict such paths to be trails, that is, they only consider paths without repeated edges. In this paper we consider the evaluation problem for RPQs under trail semantics, in the case where the expression is fixed. We show that, in this setting, there exists a trichotomy. More precisely, the complexity of RPQ evaluation divides the regular languages into the finite languages, the class T_tract (for which the problem is tractable), and the rest. Interestingly, the tractable class in the trichotomy is larger than for the trichotomy for simple paths, discovered by Bagan et al. [Bagan et al., 2013]. In addition to this trichotomy result, we also study characterizations of the tractable class, its expressivity, the recognition problem, closure properties, and show how the decision problem can be extended to the enumeration problem, which is relevant to practice
Enumerating Subgraph Instances Using Map-Reduce
The theme of this paper is how to find all instances of a given "sample"
graph in a larger "data graph," using a single round of map-reduce. For the
simplest sample graph, the triangle, we improve upon the best known such
algorithm. We then examine the general case, considering both the communication
cost between mappers and reducers and the total computation cost at the
reducers. To minimize communication cost, we exploit the techniques of (Afrati
and Ullman, TKDE 2011)for computing multiway joins (evaluating conjunctive
queries) in a single map-reduce round. Several methods are shown for
translating sample graphs into a union of conjunctive queries with as few
queries as possible. We also address the matter of optimizing computation cost.
Many serial algorithms are shown to be "convertible," in the sense that it is
possible to partition the data graph, explore each partition in a separate
reducer, and have the total computation cost at the reducers be of the same
order as the computation cost of the serial algorithm.Comment: 37 page
Dynamic Set Intersection
Consider the problem of maintaining a family of dynamic sets subject to
insertions, deletions, and set-intersection reporting queries: given , report every member of in any order. We show that in the word
RAM model, where is the word size, given a cap on the maximum size of
any set, we can support set intersection queries in
expected time, and updates in expected time. Using this algorithm
we can list all triangles of a graph in
expected time, where and
is the arboricity of . This improves a 30-year old triangle enumeration
algorithm of Chiba and Nishizeki running in time.
We provide an incremental data structure on that supports intersection
{\em witness} queries, where we only need to find {\em one} .
Both queries and insertions take O\paren{\sqrt \frac{N}{w/\log^2 w}} expected
time, where . Finally, we provide time/space tradeoffs for
the fully dynamic set intersection reporting problem. Using words of space,
each update costs expected time, each reporting query
costs expected time where
is the size of the output, and each witness query costs expected time.Comment: Accepted to WADS 201
- …